Multiphase analysis by linkage, quantitative transmission disequilibrium, and measured genotype: systolic blood pressure in complex Mexican American pedigrees
© Chen et al.; licensee BioMed Central Ltd. 2014
Published: 17 June 2014
We apply a multiphase strategy for pedigree-based genetic analysis of systolic blood pressure data collected in a longitudinal study of large Mexican American pedigrees. In the first phase, we conduct variance-components linkage analysis to identify regions that may harbor quantitative trait loci. In the second phase, we carry out pedigree-based association analysis in a selected region with common and low-frequency variants from genome-wide association studies and whole genome sequencing data. Using sequencing data, we compare approaches to pedigree analysis in a 10 megabase candidate region on chromosome 3 harboring a gene previously identified by a consortium for blood pressure genome-wide association studies. We observe that, as expected, the measured genotype analysis tends to provide larger signals than the quantitative transmission disequilibrium test. We also observe that while linkage signals are contributed by common variants, strong associations are found mainly at rare variants. Multiphase analysis can improve computational efficiency and reduce the multiple testing burden.
In pedigree-based studies, discovery of genomic regions harboring genetic determinants of quantitative traits such as systolic blood pressure (SBP) has conventionally been conducted using linkage analysis based on identity-by-descent allele sharing. In the genome-wide association studies (GWAS) era of cost-effective high-throughput genotyping technology, the mapping of the genetic basis of complex traits/diseases in human populations has been population-based in unrelated individuals, and largely case-control or cross-sectional in design. With the advent of next-generation sequencing technology, investigators are able to examine each single base pair (bp) and test for association with a trait, but the massive amount of variant information available for analysis can be overwhelming. With the development of techniques for pedigree-based imputation from sequence data on selected pedigree members, pedigree-based analysis of whole genome sequencing data is feasible.
SAFS pedigree data
From a total of 1389 participants in 20 pedigrees, 932 have SBP measurements at 1 or more study exams for up to 4 exams. Characteristics recorded include sex, year of exam, age at each exam, current use of antihypertensive medications, and current tobacco smoking. GWAS genotypes were assayed in a total of 959 individuals, with a total of 65,519 GWAS SNPs on chromosome 3 available for analysis. Among these individuals, 464 were also sequenced at an average 60 × coverage, resulting in 1,215,399 sequence variants on chromosome 3. For the remaining 495 individuals, the missing genotypes at the sequence variants were imputed using a novel population-based imputation approach . Because the program SOLAR required genotype data, in the focused association analysis following the linkage scan, we used the imputed "best guess" sequence genotypes. Subsequent analyses ignored imputation uncertainty.
Antihypertensive medication complicates the analysis of SBP, because patients prescribed medication tend to have elevated underlying SBP values. Based on a novel extension developed by Konigorski et al , we treated medication as a right-censoring indicator such that the unmodified SBP for an individual under medication is higher than the observed, and fit a censored normal regression model to the observed SBP measurements for each exam assuming noninformative censoring. In addition, we took into account the between-pedigree variation by incorporating a pedigree-specific random component. Analyzing each of the first 3 visits separately, we included sex, exam-specific age, and smoking status as covariates. Let Y be the observed SBP and be the fitted SBP from the censored model given exam-specific covariates and pedigree-specific random effects. For an individual receiving medication, let be the conditional expectation of the underlying SBP given exam-specific covariates and pedigree-specific random effects and assuming that the underlying unmodified SBP is greater than the observed value, for details see Konigorski et al . We computed residuals at each exam by if an individual was not under medication, and by otherwise. The mean of the residuals at exams 1 to 3, denoted by R, was then used as an adjusted phenotype for each individual in subsequent stages of linkage and association analysis.
Variance component linkage analysis
where the elements of the structuring matrix for the locus-specific variance, Π, are proportions representing the identity-by-descent (IBD) sharing of alleles for each relative pair at this locus; the structuring matrix for the additive genetic variance component, 2Φ, is twice the kinship coefficient matrix; and the matrix for the variance resulting from unshared environmental effects is specified by the identity matrix I n . To examine the influence of GWAS SNP density on linkage analysis, we sampled 3 sets of SNPs. Initially, a total of 988 SNP markers was randomly sampled from chromosome 3 GWAS SNPs with MAF ≥5%. To allay concerns about adequacy of SNP density, in the second and third samplings, we randomly sampled 1620 and 2999 SNPs, respectively, excluding previously sampled SNPs and using the same MAF criteria. We first performed quantitative genetic analysis to create a suitable null model for each selected marker . Applying the genetic analysis software SOLAR to the sampled GWAS data, we estimated IBD allele sharing for all pairs of relatives in each pedigree, using single-marker estimation to ease computation in the very complex pedigrees. We also performed 2-point rather than multipoint linkage analysis and computed the log of odds (LOD) score for each marker. Regions with LOD >1.2 were considered interesting for subsequent fine mapping analyses. For demonstration purposes, in this paper we focused fine-mapping analyses on the candidate region 165 to 175 megabases (Mb) on chromosome 3.
Family-based association analysis
In a candidate region on chromosome 3 identified with some evidence for linkage in the sampled GWAS data and previously reported in GWAS meta-analysis , we compared the linkage signals to the association analyses implemented in SOLAR: measured genotype (MG) analysis and the quantitative transmission disequilibrium test (QTDT) , in which the phenotype, R, is modeled as a linear combination of fixed effects (ie, genotype scores) and random effects (ie, polygenic and linkage components). The genotype scores are decomposed into between-family (b) and within-family (w) components, resulting in fixed-effect model . The MG approach estimates regression coefficients with the constraint βb = βw. The QTDT approach estimates both βb and βw, and tests whether the within-family parameter βw is significantly different from 0. QTDT reflects the correlation between SNP genotype and phenotype within families and is robust to population stratification effects , which can be a concern for MG, but QTDT is less powerful than MG. We computed the IBD allele sharing among pedigree members at each sequence variant in the candidate region, and then performed association tests simultaneously modeling linkage as a variance component based on the IBD sharing estimates. When linkage is present, including the linkage component in the association analysis helps control type I error .
Results and discussion
Results of 2-point linkage analysis with LOD >1.20, ordered by position, using 3 sets of randomly sampled common GWAS SNPs (MAF ≥0.05) from chromosome 3. LOD scores in bold denote values > 1.35 (column 5).
Top 5 linkage signals and top 5 associations with SBP are indicated in bold in the 165- to 175-Mb region on chromosome 3 (ordered by position)
The main purpose of the proposed multiphase design is to first identify interesting genomic regions for a complex quantitative trait, and then to fine-map those regions in follow-up studies, reducing both the number of tests for association conducted at null variants and the computational processing time. With randomly sampled common GWAS SNP data for large Mexican American pedigrees from SAFS, we identified 4 linkage regions for SBP on chromosome 3. Especially for 2-point linkage, high-density SNP analysis is desirable. In linkage analysis in an identified region, we observed higher LOD scores using imputed sequence data compared to GWAS SNP data, particularly for common variants (Figure 3, top panel). In family-based association analysis of sequence variants, however, we observed stronger association signals at rare variants compared to common variants. As is typical in fine-mapping studies, we examined association with sequence variants under linkage peaks obtained from a chromosome-wide scan. Depending on the inherent power in a study, it may be advisable to establish a fairly liberal criterion for identification of linkage regions. Although the linkage strategy we used reduces the multiple testing burden in phase 2, it may miss regions of interest that would have been detected by a GWAS association analysis. For purposes of comparison, albeit in a single data set, we examined the results from a complete, dense GWAS scan of chromosome 3 that used mixed models to account for the pedigree structure . We observed that both strategies identified regions near 150 Mb and 175 Mb using a linkage criterion of LOD >1.0 and a GWAS criterion of p <10−5; the chromosome-wide maxima near 150 Mb agreed quite well. Our linkage scan also identified regions at other locations, including those near 10, 27, and 100 Mb, that would have required more liberal GWAS criteria for identification.
This research was supported by funding from the Canadian Institutes of Health Research: CIHR Operating Grant MOP-84287 (SBB, principal investigator), CIHR Training Grant GET-101831 (ZC). ZC is a Fellow with CIHR STAGE (Strategic Training for Advanced Genetic Epidemiology). The results of this study were obtained with the program packages R and SOLAR. The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.
This article has been published as part of BMC Proceedings Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.
- Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, Chasman DI, Smith AV, Tobin MD, Verwoert GC, Hwang SJ, et al: Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature. 2011, 478: 103-109. 10.1038/nature10405.View ArticlePubMedGoogle Scholar
- Almasy L, Dyer TD, Peralta JM, Jun G, Fuchsberger C, Almeida MA, Kent JW, Fowler S, Duggirala R, Blangero J: Data for Genetic Analysis Workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees. BMC Proc. 2014, 8 (Suppl 1): S2-PubMed CentralView ArticlePubMedGoogle Scholar
- Konigorski S, Yilmaz YE, Bull SB: Bivariate genetic association analysis of systolic and diastolic blood pressure by copula models. BMC Proc. 2014, 8 (Suppl 1): S72-PubMed CentralView ArticlePubMedGoogle Scholar
- Almasy L, Blangero J: Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet. 1998, 62: 1198-1211. 10.1086/301844.PubMed CentralView ArticlePubMedGoogle Scholar
- Abecasis GR, Cookson WO, Cardon LR: Pedigree tests of transmission disequilibrium. Eur J Hum Genet. 2000, 8: 545-551. 10.1038/sj.ejhg.5200494.View ArticlePubMedGoogle Scholar
- Kent JW, Dyer TD, Göring HHH, Blangero J: Type I error rates in association versus joint linkage/association tests in related individuals. Genet Epidemiol. 2007, 31: 173-177. 10.1002/gepi.20200.View ArticlePubMedGoogle Scholar
- Nalpathamkalam T, Derkach A, Paterson AD, Merico D: Genetic Analysis Workshop 18 single-nucleotide variant prioritization based on protein impact, sequence conservation, and gene annotation. BMC Proc. 2014, 8 (Suppl 1): S11-PubMed CentralView ArticlePubMedGoogle Scholar
- Wu Y, Briollais L: Mixed-effects models for joint modeling of sequence data in longitudinal studies. BMC Proc. 2014, 8 (Suppl 1): S92-PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.