Combined linkage and family-based association analysis improves candidate gene detection in Genetic Analysis Workshop 18 simulation data
© Li et al.; licensee BioMed Central Ltd. 2014
Published: 17 June 2014
Because the genotype-phenotype correlation information is investigated differently by linkage and association analyses, various efforts have been made to model linkage and association jointly. However, joint modeling methods are usually computationally intensive; hence they cannot currently accommodate large pedigrees with dense markers. This article proposes a simple method to combine the linkage and association evidence with the aim of improving the detection power of disease susceptibility genes. Our detection power comparisons show that the combined linkage-association p values can improve remarkably the causal gene detection power in Genetic Analysis Workshop 18 simulation data.
Linkage analysis in family data looks for the genomic region where the disease phenotype of interest and a stretch of genetic markers are cosegregated. As a result of the strong identity-by-descent (IBD) sharing among family members and a limited number of recombination events present in collected pedigrees, the critical regions detected by linkage analyses rarely pinpoint a single gene. However, linkage analysis is immune to the confounding of population stratification suffered by association analyses. Association analyses regress quantitative phenotypes on a marker's genotypes or compare allele frequencies of a single-nucleotide polymorphism (SNP) between cases and controls, and can narrow down the putative disease regions to small regions of high linkage disequilibrium (LD blocks), which are usually much shorter than linked regions. With the advance of next-generation sequencing technology and highly accurate imputation methods, association analyses with dense marker coverage can even potentially locate candidate causal variants (and thus candidate genes) directly. Because the genotype-phenotype correlation information is investigated differently by linkage and family-based association analyses, various efforts have been made to model linkage and association jointly [1–9]. Naming a few among many, Li et al  proposed 2 likelihood ratio tests in a joint linkage-association model to characterize whether an associated SNP can partially or completely explain linkage signals; Goring and Terwilliger  proposed a joint linkage and LD model through the use of a pseudomarker locus. Joint modeling methods [1, 3–6] are usually computationally intensive; hence they cannot currently accommodate large pedigrees with dense markers. This article proposes a simple method to combine the linkage and association evidence with the aim of improving the detection power of disease susceptibility genes. Specifically, we convert the linkage LOD score to p values and adopt the unweighted Liptak  method to combine the linkage and association p values. Our detection power comparisons show that the combined linkage-association p values can improve the causal gene detection power remarkably in Genetic Analysis Workshop 18 (GAW18) simulation data.
All the analyses and comparisons in this report are performed with the disease causal variants known.
Long-term mean blood pressure
We adopt the method found in Levy et al  to adjust for the effects of age, sex, and medication status on the blood pressure, and calculate the long-term mean systolic blood pressure (SBP) on the basis of the 3 time-point-adjusted SBP measurements.
Multipoint quantitative trait linkage analysis (SOLAR)
SOLAR  is a variance component multipoint linkage analysis software for quantitative traits. In the restricted model, the additive genetic variance because of the quantitative trait locus (QTL) of interest equals zero, whereas in the alternative model the additive genetic variance because of the QTL of interest is estimated by maximizing the likelihood of the model. The linkage LOD score is the difference log10 in likelihood between the alternative and the restricted models. A total of 3071 genome-wide association studies (GWAS) array SNPs were randomly selected so that they were not in high LD in unrelated individuals. Multipoint linkage analysis in SOLAR  was applied to the LD-pruned SNPs on the quantitative traits Q1 and mean SBP.
Family-based association test using multiple markers
The multimarker version of family-based association test (FBAT) statistics is a linear combination of single-marker FBAT statistics with the data-driven combination weights . We adopt the option -e in the FBAT package, which forces it to estimate the association signal in the presence of linkage. The analysis unit is a gene whose starting and ending physical positions are obtained from the UCSC refgene database. The imputed genotypes of all the nonsynonymous SNPs in a gene were analyzed together to obtain gene-based association p values.
Combining linkage and association evidence
In the output from SOLAR, LOD scores were given with respect to genetic distances; the physical boundaries for each gene were mapped to genetic distances, and a gene was assigned the average LOD score of the genetic region to which it is mapped. Next, the linkage LOD score is converted to a p value by observing that 2*log e (10LOD) is asymptotically distributed as a 0.5:0.5 mixture of a variable and a point mass at zero . The linkage and association p values for a gene are inverse-normal transformed to Z1 and Z2 respectively. We then adopt the following unweighted Liptak method  to combine linkage and association evidence and obtain a combined p value. When Z1 and Z2 are independent, where l k is a k-element vector of 1, Φ is a 2 × 2 identity matrix, and (Z1 ,Z2 ) is a row vector made up of Z1 and Z2 that follows the standard normal distribution asymptotically. When Z1 and Z2 are correlated , Φ can be empirically estimated as the correlation matrix of the matrix , where (j = 1,2) is an N-element column vector of test statistics for test j when the phenotypes are permuted N times. The combined linkage and association p values were calculated using Liptak method with and without correlation correction.
The linkage analysis showed that chromosome 3 had an LOD score >1.5 three and nine times among simulations 1 to 10 for the traits of Q1 and mean SBP, respectively. Most of the linkage regions for the trait of mean SBP were mapped around 55 to 70 cM, whereas for the trait of Q1, the linkage regions were quite scattered, being0 to 30 cM, 125 cM, and 165 to 220 cM for the 3 simulations with LOD scores >1.5. It turned out that chromosome 3 had the strongest linkage signal.
FBAT was applied to 8047 genes among 11 chromosomes that have more than 1 nonsynonymous SNP. We mimicked the fast validation strategy in practice, which took top 50 candidates to validate in independent samples. Because we investigated gene-based analyses, we took a p value threshold so that top 50 genes were checked against the simulated disease model. For mean SBP, on average, 49 of 8047 genes had combined p values less than 0.001 among simulations 1 to 10. Only 2 causal genes, MAP4 and FLNB on chromosome 3, were ever among the top 49, so we investigated their detection power. For Q1, on average, there were 9.5 and 9.1 genes out of 8047 with FBAT p values and combined p values smaller than 0.001, corresponding to an empirical false-positive rate of 0.0012 and 0.0011, respectively.
Ranks of 2 causal genes (MAP4 and FLNB) for trait Q1 and mean SBP based on FBAT p values and combined linkage and association p values for simulations 1 to 10.
For the trait of mean SBP, the combined p values were viewed to improve the FBAT p values if the rank of the causal gene based on the latter was beyond 49, and the rank based on the former was within 49. There were 5 and 4 improvements for MAP4 and FLNB, respectively (highlighted in Table 1). On the contrary, there was no such improvement for the trait Q1.
Generally speaking, the power for detecting the causal genes was low, except for MAP4, which explains a large percentage of SBP variance (7.79%). Combined p values improved the detection power for MAP4 from 50% to 100%. For FLNB that explains a much lower percentage of SBP variance (0.29%); FBAT had no detection power. Combined p values improved the power to 40%. Moreover, the type I error was well controlled in our combined p values. These results indicated a promising strategy of combining the linkage and association evidence to improve the true discovery rate/power. Furthermore, our method combines the linkage and association p values in a simple way; thus it is applicable to large pedigrees as long as large pedigrees can be accommodated in the linkage analyses. The option -e in FBAT software forces an estimation of association in the presence of linkage, thus the association signal detected is expected to be independent of the linkage signal. That the combined p values with and without correlation correction were very similar (correlation coefficient >0.99, data not shown) verified this.
The combined p values we propose to calculate depend on the strength of both linkage and association signals. Moderate signals in both linkage and association will generate a more significant combined p value than a significant signal in one test but a null signal in the other. To maximize the association power, we analyzed only nonsynonymous SNPs in gene-based association tests, as we know that the nonsynonymous SNPs are enriched with causal variants with relatively large effects from the released disease model. In real sequencing projects, especially whole genome sequencing studies, we may select other functional variants to analyze, such as deleterious or regulatory SNPs, to improve the association power.
In our opinion, the combined test is more powerful because linkage and association analyses investigate different parts of phenotype-genotype correlation, thus providing nonredundant information. Combining these 2 p values makes some causal genes that have moderate supports in both tests stand out. For example, for simulation 8, chromosome 3 had a LOD score of <1.5. However, the regions to which MAP4 and FLNB were mapped still have moderate linkage evidence, with LOD scores of 0.82 and 0.53, respectively. As a result, the ranks improved from 154 (FBAT p value = 0.0137) to 6 (combined p value = 0.00166) for MAP4, and from 2372 (FBAT p value = 0.195) to 343 (combined p value = 0.0430) for FLNB.
We proposed a simple method to combine the linkage and family-based association evidence that is applicable to large pedigrees. Our results showed that the combined linkage and FBAT p values do improve the causal gene detection power remarkably. The improved true discovery will render a higher chance for the top genes to be validated.
We acknowledge the support of the Agency for Science, Technology, and Research (A*STAR) of Singapore. The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.
This article has been published as part of BMC Proceedings Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.
- Biernacka JM, Cordell HJ: Exploring causality via identification of SNPs or haplotypes responsible for a linkage signal. Genet Epidemiol. 2007, 31: 727-740. 10.1002/gepi.20236.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen MH, Van Eerdewegh P, Dupuis J: Identification of polymorphisms explaining a linkage signal: application to the GAW14 simulated data. BMC Genet. 2005, 6 (Suppl 1): S88-10.1186/1471-2156-6-S1-S88.PubMed CentralView ArticlePubMedGoogle Scholar
- Dupuis J, Van Eerdewegh P: Identification of polymorphisms that explain a linkage peak: conditioning on parental genotypes. Genet Epidemiol. 2003, 25: 247-Google Scholar
- Göring HH, Terwilliger JD: Linkage analysis in the presence of errors IV: joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified. Am J Hum Genet. 2000, 66: 1310-1327. 10.1086/302845.PubMed CentralView ArticlePubMedGoogle Scholar
- Li C, Scott LJ, Boehnke M: Assessing whether an allele can account in part for a linkage signal: the Genotype-IBD Sharing Test (GIST). Am J Hum Genet. 2004, 74: 418-431. 10.1086/381712.PubMed CentralView ArticlePubMedGoogle Scholar
- Li M, Boehnke M, Abecasis GR: Joint modeling of linkage and association: identifying SNPs responsible for a linkage signal. Am J Hum Genet. 2005, 934-949. 76Google Scholar
- Roeder K, Bacanu S, Wasserman L, Devlin B: Using linkage genome scans to improve power of association in genome scans. Am J Hum Genet. 2006, 78: 243-252. 10.1086/500026.PubMed CentralView ArticlePubMedGoogle Scholar
- Sun L, Cox NJ, McPeek MS: A statistical method for identification of polymorphisms that explain a linkage result. Am J Hum Genet. 2002, 70: 399-411. 10.1086/338660.PubMed CentralView ArticlePubMedGoogle Scholar
- Thornton T, McPeek MS: ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure. Am J Hum Genet. 2010, 86: 172-184. 10.1016/j.ajhg.2010.01.001.PubMed CentralView ArticlePubMedGoogle Scholar
- Liptak T: On the combination of independent tests. Magyar Tud Akad Mat Kutato Int Kozl. 1958, 3: 171-196.Google Scholar
- Levy D, DeStefano AL, Larson MG, O'Donnell CJ, Lifton RP, Gavras H, Cupples LA, Myers RH: Evidence for a gene influencing blood pressure on chromosome 17. Genome scan linkage results for longitudinal blood pressure phenotypes in subjects from the Framingham Heart Study. Hypertension. 2000, 36: 477-483. 10.1161/01.HYP.36.4.477.View ArticlePubMedGoogle Scholar
- Almasy L, Blangero J: Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet. 1998, 62: 1198-1211. 10.1086/301844.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu X, Rakovski C, Xu X, Laird N: An efficient family-based association test using multiple markers. Genet Epidemiol. 2006, 30: 620-626. 10.1002/gepi.20174.View ArticlePubMedGoogle Scholar
- Pesarin F: Multivariate Permutation Tests: With Applications in Biostatistics. New York: Wiley. 2001Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.