Identification of rare variants for hypertension with incorporation of linkage information
© Chiu et al.; licensee BioMed Central Ltd. 2014
Published: 17 June 2014
We conducted linkage analysis using the genome-wide association study data on chromosome 3, and then assessed association between hypertension and rare variants of genes located in the regions showing evidence of linkage. The rare variants were collapsed if their minor allele frequencies were less than or equal to the thresholds: 0.01, 0.03, or 0.05. In the collapsing process, they were either unweighted or weighted by the nonparametric linkage log of odds scores in 2 different schemes: exponential weighting and cumulative weighting. Logistic regression models using the generalized estimating equations approach were used to assess association between the collapsed rare variants and hypertension adjusting for age and gender. Evidence of association from the weighted and unweighted collapsing schemes with minor allele frequencies ≤0.01, after accounting for multiple testing, was found for genes DOCK3 (p = 0.0090), ARMC8 (p = 1.29E-5), KCNAB1 (p = 5.8E-4), and MYRIP (p = 5.79E-6). DOCK3 and MYRIP are newly discovered. Incorporating linkage scores as weights was found to help identify rare causal variants with a large effect size.
Linkage studies have high power to detect loci that have variants with large effect size, although they often are rare in the population . In contrast, association studies generally have high power to detect common variants with a small effect size for diseases or traits . Recently, next-generation sequencing techniques have made feasible sequencing of all exons or the whole genome of an adequate number of individuals for meaningful results. Rare variant analysis is challenging because of sequencing-based uncertainties in variant calling, the large search space of rare variants, and the inherently low carrier rate frequencies. Rare variants, however, are quite common in the general population. Therefore, it could be helpful to apply linkage analysis on these new DNA sequencing data to identify rare causal variants with a large effect size . In the present study, we conducted linkage analysis using genome-wide association studies (GWAS) data to identify disease susceptibility loci, then applied logistic regression models using generalized estimating equations (GEEs) to assess the associations between hypertension and rare variants of the susceptibility genes in the linked regions.
GWAS and phenotype data
Linkage analysis was conducted on chromosome 3 GWAS data. A total of 65,519 single-nucleotide polymorphisms (SNPs) were genotyped on chromosome 3 for 959 individuals from 20 original pedigrees; of these individuals, 344 had hypertension and 506 did not. As a result of the limitations of our computing facility for linkage analysis, PedCut  was used to split large pedigrees with members greater than 20 bits into smaller pedigrees to enable analyses by MERLIN . Consequently, we analyzed a total of 138 pedigrees with 1495 individuals (missing parents were added in); for the divided pedigrees, pedigrees ranged from 3 to 25 individuals. Five SNPs were removed for failing the Hardy-Weinberg equilibrium (p value < 10-4) test. The Hardy-Weinberg equilibrium test was performed using PLINK 1.07  based on 56 unrelated subjects. A total of 22,056 genotypes with genotyping errors (genotyping error rate was approximately 3.51 × 10−4) were further excluded by the MERLIN 1.1.2 computing package . Subjects being diagnosed with hypertension for at least 1 of the 4 time points were considered as affected.
Linkage and association analysis
Linkage screens on chromosome 3 GWAS data were conducted using MERLIN 1.1.2; linkage evidence was assessed based on nonparametric linkage (NPL) log of odds (LOD) scores by Kong and Cox  where identity-by-descent sharing in affected relative pairs was computed. Because of the heavy computational load from the tremendous number of markers, linkage analyses were performed with an interval of 1000 SNPs in 1 run. Each interval had a 5-SNP duplicate with its following interval. One-LOD support intervals were constructed for each linkage peak with NPL LOD scores ≥4.0. Genes located in the 1-LOD support intervals were identified and annotated based on the genetic map NCBI build 36. Rare variants--defined as variants with minor allele frequency (MAF) ≤0.01, 0.03, or 0.05--in each gene were collapsed, either unweighted or weighted by the LOD scores, in 2 ways: exponential weighting and cumulative weighting . Namely, a rare variant i with LOD score z i was weighted by if a subject carried at least 1 minor allele of SNP i and by, otherwise. Here, for the exponential weighting, for the cumulative weighting, is the standard normal cumulative distribution, and where m is the total number of rare variants per gene, i = 1,..., m. For individual k, his or her collapsed rare variants (CRVs) were then equal to assuming the total number of rare variants the individual carries was m k . The association between hypertension and the CRVs was assessed by logistic regression models adjusted for age and gender. The GEE approach implemented in the SAS computing package (SAS Inc., Cary, NC) was used to account for within-family correlations in the association analysis based on the original 20 families under an exchangeable covariance structure. Multiple testing corrections were made using the false discovery rate (FDR) as implemented in SAS.
Estimates for the effects of the CRVs in individual genes with 3 weighting schemes for variants with MAF ≤0.01
Linkage scores from GWAS data can be useful to narrow down regions for detecting rare variants associated with disease. Therefore, using linkage scores as weights for collapsing rare variants may improve the power of detection. Although in the present study, the effects of rare variants were assumed to be in the same direction during collapsing, it is important to take the directions of effects into consideration during collapsing, as the effects of significant variants can be diluted or eliminated when collapsed with other variants having neutral or opposite effects. One way to eliminate this problem is to test the variation of individual variant effects, rather than their mean effects, in mixed-effects models . Studying CRVs from different collapsing categories helped identify the MAF category yielding consistent results over genes, because the significance of CRV depends on the thresholds for collapsing. Intuitively, a variant may be functional if its MAF is below a certain threshold; therefore, a varying-threshold approach has proved to be helpful with the identification of functional variants . Incorporating a varying-threshold approach may improve power to detect functional rare variants. In general, the changes in effect sizes resulting from collapsing additional variants or weighting decreased as MAF thresholds increased. Collapsing additional variants often reduced the effect size (results not shown), whereas weighting usually increased the effect size, particularly when the MAF threshold was small. In addition, the significance of ZNF621 with an effect size of −0.44 (SE = 0.021) under MAF ≤0.01 resulted from only a single allele. The effect size changed to 0.045 (SE = 0.057) and became insignificant after collapsing with the other 2 variants under MAF ≤0.03 or ≤0.05. This observation suggested the necessity to carefully reexamine and interpret the significant result that was based on only a few rare variants.
Accounting for multiple testing, the CRVs from the following 4 genes were identified for hypertension: DOCK3, ARMC8, KCNAB1, and MYRIP. KCNAB1 was the only gene previously identified in a GWAS, specifically for being associated with blood pressure. The other 3 genes were novel for hypertension/blood pressure in the present association analysis. Our proposed method focused on rare variant detection; common variants were not analyzed in the association analyses. Therefore, we did not expect to have a large proportion of replicate findings from GWAS.
The 20 families varied in size and ranged from 22 to 86 individuals, so it may not be reasonable to use an exchangeable correlation structure in the GEE approach. However, independent and exchangeable correlation structures involving less covariance parameters were better options than others given such a small number of families. An exchangeable correlation structure was adopted here as it is often a more appropriate correlation structure for a family study than other structures . GEE approaches have robust variance estimators for extended pedigrees in a genome-wide association study setting [14, 15]. However, because of the limited sample size in the present study, some of the collapsed variants were not as robust as the others (data not shown). After applying an independent correlation structure, we observed that the KCNAB1 gene became insignificant, whereas the genes GBE1 and GK5 were significant under the unweighted scheme, accounting for multiple testing. In such conditions, it may be helpful to apply a statistical method to select an appropriate variance-covariance structure . This possibility will be investigated in a future study.
In summary, it is helpful to apply linkage analysis to GWAS or sequencing data, and then incorporate the linkage information into association analyses under certain scenarios. The benefits of using this method were seen particularly in cases where the collapsed variant had a large effect size. A powerful collapsing method should consider the effect size and direction of a rare variant, as well as the threshold of MAF during collapsing. We are currently systematically studying and modifying this proposed method under different scenarios to improve its power to detect functional rare variants.
This project was supported by a grant from the National Science Council, Taiwan (NSC98-2118-M-400-002) and a grant from the National Health Research Institutes, Taiwan (PH-099-pp-04). We thank Ms. Karen Klein (Office of Research, Wake Forest School of Medicine) and Miss Tamara Adams for their editorial contributions to this manuscript. The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.
This article has been published as part of BMC Proceedings Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.
- Laird NM, Lange C: Family-based methods for linkage and association analysis. Adv Genet. 2008, 60: 219-252.View ArticlePubMedGoogle Scholar
- Dering C, Pugh E, Ziegler A: Statistical analysis of rare sequence variants: an overview of collapsing methods. Genet Epidemiol. 2011, 35 (Suppl 1): S12-S17.PubMed CentralView ArticlePubMedGoogle Scholar
- Bailey-Wilson JE, Wilson AF: Linkage analysis in the next-generation sequencing data. Hum Hered. 2011, 72: 228-236. 10.1159/000334381.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu F, Kirichenko A, Axenovich TI, Duijn CMV, Aulchenko YS: An approach for cutting large and complex pedigrees for linkage analysis. Eur J Hum Genet. 2008, 16: 854-860. 10.1038/ejhg.2008.24.View ArticlePubMedGoogle Scholar
- MERLIN. [http://www.sph.umich.edu/csg/abecasis/merlin/tour/linkage.html]
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC: PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.PubMed CentralView ArticlePubMedGoogle Scholar
- Kong A, Cox J: Allele-sharing models: LOD scores and accurate linkage tests. Am J Hum Genet. 1997, 61: 1179-1188. 10.1086/301592.PubMed CentralView ArticlePubMedGoogle Scholar
- Roeder K, Bacanu SA, Wasserman L, Devlin B: Using linkage genome scans to improve power of association in genome scans. Am J Hum Genet. 2006, 78: 243-252. 10.1086/500026.PubMed CentralView ArticlePubMedGoogle Scholar
- Linkage results from the RGD website, [http://rgd.mcw.edu/rgdweb/search/qtls.html?100]
- GWAS results from the GWAS CENTRAL website =HGVRS568,HGVRS567,HGVRS566,HGVRS710,HGVRS709,HGVRS708&t=], [http://www.gwascentral.org/browser/genome?r]
- Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho-Melander M, Kathiresan S, Purcell SM, Roeder K, Daly MJ: Testing for an unusual distribution of rare variants. PLoS Genet. 2011, 7: e1001322-10.1371/journal.pgen.1001322.PubMed CentralView ArticlePubMedGoogle Scholar
- Price AL, Kryukov GV, de Bakker PI, Purcell SM, Staples J, Wei LJ, Sunyaev SR: Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet. 2010, 86: 832-838. 10.1016/j.ajhg.2010.04.005.PubMed CentralView ArticlePubMedGoogle Scholar
- Ballinger GA: Using generalized estimating equations for longitudinal data analysis. Organ Res Methods. 2004, 7: 127-150. 10.1177/1094428104263672.View ArticleGoogle Scholar
- Suktitipat B, Mathias RA, Vaidya D, Yanek LR, Young JH, Becker LC, Becker DM, Wilson AF, Fallin MD: The robustness of generalized estimating equations for association tests in extended family data. Hum Hered. 2012, 74: 17-26. 10.1159/000341636.PubMed CentralView ArticlePubMedGoogle Scholar
- Liang KY, Zeger SL: Longitudinal data analysis using generalized linear models. Biometrika. 1986, 73: 13-22. 10.1093/biomet/73.1.13.View ArticleGoogle Scholar
- Pan W, Connett JE: Selecting the working correlation structure in generalized estimating equations with application to the lung health study. Stat Sin. 2002, 12: 475-490.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.