Proceedings | Open | Published:
Bivariate association analysis of longitudinal phenotypes in families
BMC Proceedingsvolume 8, Article number: S90 (2014)
Statistical genetic methods incorporating temporal variation allow for greater understanding of genetic architecture and consistency of biological variation influencing development of complex diseases. This study proposes a bivariate association method jointly testing association of two quantitative phenotypic measures from different time points. Measured genotype association was analyzed for single-nucleotide polymorphisms (SNPs) for systolic blood pressure (SBP) from the first and third visits using 200 simulated Genetic Analysis Workshop 18 (GAW18) replicates. Bivariate association, in which the effect of an SNP on the mean trait values of the two phenotypes is constrained to be equal for both measures and is included as a covariate in the analysis, was compared with a bivariate analysis in which the effect of an SNP was estimated separately for the two measures and univariate association analyses in 9 SNPs that explained greater than 0.001% SBP variance over all 200 GAW18 replicates.The SNP 3_48040283 was significantly associated with SBP in all 200 replicates with the constrained bivariate method providing increased signal over the unconstrained bivariate method. This method improved signal in all 9 SNPs with simulated effects on SBP for nominal significance (p-value <0.05). However, this appears to be determined by the effect size of the SNP on the phenotype. This bivariate association method applied to longitudinal data improves genetic signal for quantitative traits when the effect size of the variant is moderate to large.
Traditional analyses of genetic variants influencing complex diseases focus on phenotypes and covariate measurements from a single time point. However, the majority of human epidemiologic studies collect information from multiple measurements. This, coupled with the knowledge that many quantitative phenotypes correlated with complex disease change with age or environmental confounders, suggests that inclusion of a temporal component may allow for increased understanding of complex diseases. Given the nature of these longitudinal data, methods jointly using multiple time points when performing association may have increased statistical power over univariate association methods [1–6]. However, although some statistical methods have been proposed for the analysis of longitudinal data, few have been successful in being adopted by the wider genetic epidemiologic community because of the difficulty of implementing them. One potential drawback to the utility of these bivariate methods is the addition of a degree of freedom as a result of the additional phenotype, thereby potentially reducing statistical power to detect genetic signals that do not vary with time or age.
We present a method for bivariate association using longitudinal data from the same phenotype in families using the Genetic Analysis Workshop 18 (GAW18) simulated single-nucleotide polymorphism (SNP) data for the phenotype systolic blood pressure (SBP) from visits 1 and 3. We have previously applied this method to the analysis of different phenotypic measures of heart rate (echo- and electrocardiograms) in American Indian participants of the Strong Heart Family Study  but wish to test its efficacy in a simulated longitudinal data set. To test this method, we first conducted association using measured genotype analysis of all SNPs for SBP from visits 1 and 3 using the GAW18 family data. We then conducted two bivariate analyses within the variance-component framework using 20 SNPs known to influence SBP from the GAW18 SNPs and 20 SNPs that did not explain any of the SBP variance identified in our association analysis. This work was done with knowledge of the GAW18 simulating model.
The GAW18 data set contains 959 individuals from 20 extended Mexican American pedigrees from the Type 2 Diabetes Consortium. Each of the 200 simulated data sets includes the following information for each individual for three time periods along with gender: age, SBP, diastolic blood pressure (DBP), hypertension status, blood pressure medication status, and smoking .
Maximum likelihood methods, taking into account relationships among family members, were used to determine association for the phenotypes SBP at visit 1 (SBP_1) and visit 3 (SBP_3) independently in a polygenic model available in the computer program Sequential Oligogenic Linkage Analysis Routines (SOLAR) . Covariates included age, sex, and their interactions as well as smoking for both visits 1 and 3. Variables were carried forward to association models if associated with SBP_1 or SBP_3 at p-valuebelow0.05. Measured genotype analysis was conducted for all available GAW18 polymorphic variants in which the number of minor alleles is added to the quantitative polygenic genetic model as a covariate to assess the effect of the SNP on the mean of the trait using the equation
where s defines a variate for the ith SNP that takes the value, 0, 1, and 2 for the marker genotypes AA, Aa, and aa, respectively; α represents one-half the displacement between homozygous marker means; β represents fixed-effect regression coefficients for any measured covariates x; and g and e are random effects representing residual genetic effects and random environmental effects . This model tests whether α is different from 0 using a likelihood ratio test. Twice the difference in log-likelihoods is distributed as a random variable with 1 degree of freedom.
We also applied maximum likelihood methods accounting for familial relationships in bivariate association analyses. This bivariate method investigates two related phenotypes simultaneously, modeling genetic and environmental correlations between them . Our proposed method investigates the effect of an SNP on the mean trait values of two longitudinal phenotypes i and j, constraining the displacement in trait means (α) with each copy of the minor allele to be equal for both measures using the equations
where α, β i , and β j are fixed-effect regression coefficients and g and e are modeled through random effects with the bivariate model allowing for correlations between g i and g j (ρ g ) and between e i and e j (ρ e ). The difference between the log-likelihoods of a model in which the SNP effect is estimated versus one in which it is constrained to zero is then distributed as a distribution with 1 degree of freedom.
For our bivariate analysis, we used the same covariates from the univariate analysis along with 9 variants that explained greater than 0.001 of SBP variance from the GAW18 answers.We then compared these results with univariate association models and a bivariate model in which the effect of genotype on the mean trait value of the two phenotypes was estimated separately, distributed as a distribution with 2 degrees of freedom.Results were compared between approaches over 200 GAW18 replicates to determine which method provided the best evidence for genetic signal for these SNPs, tallying the proportion of replicates in which association was detected at p-values below and .
The average genetic correlation (ρ g ) for SBP over 200 GAW18 replicates between visits 1 and 3 was with an average environmental correlation of . This high ρ g value demonstrates that these two phenotypes are measures of the same genetic mechanism and therefore appropriate for our proposed bivariate association approach.
Table 1 shows results of three different association analyses for 9 SNPs influencing SBP across all 200 GAW18 replicates for p-values below 0.05, 0.001, and . All analyses identified the variant 3_48040283 in MAP4 as genome-wide significant . The MAP4 SNP, 3_47957996 was significant in 199 of the constrained bivariate tests and 200 of the unconstrained tests, with the number of genome-wide significant replicates dropping slightly for univariate models. Two additional variants, 1_66075952 from LEPR and MAP4 variant 3_28601297, demonstrated low numbers of genome-wide significant associations across the four tested association methods.
When comparing the different methods, the bivariate method in which the effect of genotype on mean trait values of two phenotypes is constrained to be equal provided the most robust analysis, improving association for all 9 variants compared with the bivariate analysis in which these values were estimated separately and versus univariate analyses of exam 1 and 3 in cases where the p-value is less than 0.001 or pis below 5.0 × 10−5. To ensure that the improved power for the constrained bivariate approach did not come at the expense of increased false-positive rates, we chose 20 SNPs that did not explain any of the variance from the simulated model. For these 20 null markers, there were an average of 8.1 replicates less than 0.05 for the constrained bivariate (range, 1-28), indicating no systematic inflation of p-values under the null (data not shown).
The analysis of genetic variants using longitudinal data has the potential to be a valuable resource for determining biological and environmental factors affecting complex disease phenotypes over time. This type of analysis may provide increased power to detect rare genetic variants in complex diseases or to better understand when genetic components contribute to human development . In addition, these types of analyses may allow for the identification of environmental covariates associated with complex diseases. However, although statistical genetic methods for the analysis of longitudinal data have been proposed, they have not been widely adopted. The single degree of freedom association test we propose could also be implemented easily in generalized estimating equations (GEEs) or other mixed-model frameworks. However, theoretical advantages to using the likelihood-based variance component framework are that the bivariate variance component model explicitly allows both shared/stable and unshared/changing genetic and environmental effects across timeand age in the random effects portion of the model through the estimation of genetic and environmental correlations.
In this paper, we present a bivariate approach to increase the genetic signal for a variant by constraining the effect of the SNP on the phenotype using a variance-component model. This model is predicated on the assumption that there is no gene- by-age interaction; however, the structure is general and is applicable to other issues in genetic epidemiology. As whole-genome data becomes more affordable for large-scale epidemiologic studies, an important consideration will be to maximize the ability to detect rare variants that have a large effect on complex disease. The easiest way to detect these rare variants will be through large pedigrees because they are amplified in families. However, the sample size of family studies is often small, making it difficult to determine association; therefore, methodologies that maximize the use of the genetic data and phenotypes from longitudinal studies may allow for an increased ability to identify genetic variants associated with complex disease. The model presented in this manuscript can be used as an early step in the analysis of longitudinal data and may lead to the development of more complex models.
Furlotte NA, Eskin E, Eyheramendy S: Genome-wide association mapping with longitudinal data. Genetic Epidemiol. 2012, 36: 463-471. 10.1002/gepi.21640.
Fan R, Zhang Y, Albert PS, Liu A, Wang Y, Xiong M: Longitudinal association analysis of quantitative traits. Genet Epidemiol. 2012, Sep 10. doi: 10.1002/gepi.21673. Epub ahead of print
Mukherjee B, Ko YA, Vanderweele T, Roy A, Park SK, Chen J: Principal interactions analysis for repeated measures data: application to gene-gene and gene-environment interactions. Stat Med. 2012, 31: 2531-2551. 10.1002/sim.5315.
Lasky-Su J, Lyon HN, Emilsson V, Heid IM, Molony C, Raby BA, Lazarus R, Klanderman B, Soto-Quiros ME, Avila L, et al: On the replication of genetic associations: timing can be everything!. Am J Hum Genet. 2008, 82: 849-858. 10.1016/j.ajhg.2008.01.018.
Shi G, Rao DC: Ignoring temporal trends in genetic effects substantially reduces power of quantitative trait linkage analysis. Genetic Epidemiol. 2008, 32: 61-72. 10.1002/gepi.20263.
Levy D, DeStefano AL, Larson MG, O'Donnell CJ, Lifton RP, Gavras H, Cupples LA, Myers RH: Evidence for a gene influencing blood pressure on chromosome 17. Genome scan linkage results for longitudinal blood pressure phenotypes in subjects from the Framingham Heart Study. Hypertension. 2000, 36: 477-483. 10.1161/01.HYP.36.4.477.
Melton PE, Rutherford S, Voruganti VS, Göring HH, Laston S, Haack K, Comuzzie AG, Dyer TD, Johnson MP, Kent JW, et al: Bivariate genetic association of KIAA1797 with heart rate in American Indians: the Strong Heart Family Study. Hum Mol Genet. 2010, 19: 3662-3671. 10.1093/hmg/ddq274.
Almasy L, Dyer TD, Peralta JM, et al: Data for Genetic Analysis Workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees. BMC Proc. 2014, 8 (suppl 2): S2-
Almasy L, Blangero J: Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet. 1998, 62: 1198-1211. 10.1086/301844.
Blangero J, Goring HH, Kent JW, Williams JT, Peterson CP, Almasy L, Dyer TD: Quantitative trait nucleotide analysis using Bayesian model selection. Human Biol. 2005, 541-559. 77
Almasy L, Dyer TD, Blangero J: Bivariate quantitative trait linkage analysis: pleiotropy versus co-incident linkages. Genetic Epidemiol. 1997, 14: 953-958. 10.1002/(SICI)1098-2272(1997)14:6<953::AID-GEPI65>3.0.CO;2-K.
The Genetic Analysis Workshops are supported by National Institutes of Health (NIH) grant R01 GM031575 from the National Institute of General Medical Sciences. The SOLAR statistical genetics computer package is supported by a grant from the US National Institute of Mental Health (MH059490). The supercomputing facilities used for this work at the AT&T Genetics Computing Center were supported in part by a gift from the SBC Foundation. The GAW18 whole genome sequence data were provided by the T2D-GENES (Type 2 Diabetes Genetic Exploration by Next-generation sequencing in Ethnic Samples) Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The GAW is supported by NIH grant R01 GM031575.
This article has been published as part of BMC Proceedings Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.
There are no competing interests.
LA designed the overall study. PM conducted the analysis and drafted the manuscript. All authors read and accepted the final manuscript.