Volume 3 Supplement 7
Genetic Analysis Workshop 16
Application of threelevel linear mixedeffects model incorporating geneage interactions for association analysis of longitudinal family data
 Gang Shi^{1}Email author,
 Treva K Rice^{1},
 Chi Charles Gu^{1} and
 Debeeru C Rao^{1}
DOI: 10.1186/175365613S7S89
© Shi et al; licensee BioMed Central Ltd. 2009
Published: 15 December 2009
Abstract
Longitudinal studies that collect repeated measurements on the same subjects over time have long been considered as being more powerful and providing much better information on individual changes than crosssectional data. We propose a threelevel linear mixedeffects model for testing genetic main effects and geneage interactions with longitudinal family data. The simulated Genetic Analysis Workshop 16 Problem 3 data sets were used to evaluate the method. Genomewide association analyses were conducted based on crosssectional data, i.e., each of the three singlevisit data sets separately, and also on the longitudinal data, i.e., using data from all three visits simultaneously. Results from the analysis of coronary artery calcification phenotype showed that the longitudinal association tests were much more powerful than those based on singlevisit data only. Geneage interactions were evaluated under the same framework for detecting genetic effects that are modulated by age.
Background
There is considerable evidence suggesting that genetic effects are modulated by age on some common complex traits. For systolic blood pressure, Rao and colleagues demonstrated age trends in familial effects [1–3]. The effect of apoE genotype on lipid levels was shown to be agedependent [4]. More recently, LaskySu et al. demonstrated the importance of geneage interactions in replication studies of genomewide association results [5]. They showed that the replication of a singlenucleotide polymorphism (SNP) associated with body mass index (BMI) was successful only when geneage interaction was incorporated in the analysis. At a methodological level, longitudinal studies that collect repeated measurements on the same subjects over time have long been considered as being more powerful and providing much better information on individual changes than crosssectional ones. Linear mixedeffects models [6] offer excellent approaches when dealing with longitudinal data. In genetic association analysis, mixedeffects models were used to account for the familial correlation among phenotypes collected from the same pedigree [7]. It was shown by simulations [8] that such regressiontype association test is more powerful than the classical transmissiondisequilibriumbased tests [9]. On the other hand, longitudinal family data is still less exploited. With repeated measurements on family members, phenotypes are typically correlated across both time and pedigree members. In this work, we applied a threelevel hierarchical mixedeffects model in analyzing familybased longitudinal data. Association tests of genetic main effect as well as geneage interactions were formulated under the same framework. We used the simulated phenotype data sets provided by the Genetic Analysis Workshop 16 (GAW16) Problem 3 and had the answers [10] when conducting the analyses.
Methods
where β_{0} is the mean after accounting for covariate and genetic effects, and A_{ ij }represents the age of the i^{th} individual at the j^{th} visit. The first three terms model the trait as a quadratic function of age at a population level, higher order terms or any other functional forms can also be applied if the phenotype so suggests. The fourth terms models the genetic main effect, where the measured genotype g_{ i }can be coded as dominant, additive, or recessive according to different biometric model assumptions. The fifth and sixth terms are the linear and quadratic interactions between age and genetic effects. The random effect term a_{ ij }accounts for familial as well as intervisit correlations, and the last term stands for the residual, which is assumed to be independent and identically normally distributed. In longitudinal familybased studies, repeated measurements taken within a pedigree are correlated in a more complicated fashion compared with crosssectional family studies or longitudinal studies of unrelated individuals. Repeated measurements for the same individual are temporally correlated; measurements on related individuals at each time are subject to familial correlation. More generally, measurements of related individuals at different time points are correlated as well, mostly due to the familial correlation.
where vec() stacks the columns of {a_{ ij }} to form a long vector, and Σ_{visit} and Ω_{family} represent temporal and familial variancecovariance matrices, respectively.
where 1_{ MN }and 1_{ M }are matrices with 1 as elements and of dimension MN × MN and M × M, respectively.
With the simulated GAW 16 Problem 3 data sets we focused on the coronary artery calcification (CAC) phenotype for methodological evaluations. PROC MIXED in the computer program SAS was applied for likelihoodratio tests. More details and examples of modeling individual growth within clusters using linear mixedeffects models may be found in Singer [12]. CAC phenotype was first transformed using a square root function, then adjusted by polynomial function of age separately within sex groups. The residuals were standardized to have a mean of zero and a variance of one. Skewness and kurtosis were evaluated to measure deviations from normality. Genotype data from the Affymetrix 50 k chip were used for genomewide association tests. SNPs were filtered for quality control purposes. If their minor allele frequencies were less than 5%, pvalues of HardyWeinberg equilibrium test were smaller than 10^{6}, or call rates less than 95%, SNPs were removed.
Results
Majorgene SNP tests with the first replication of the simulated GAW16 data using linear mixedeffects models with Kronecker and hierarchical structures
UN+CS^{a}  UN⊗CS^{b}  

Major gene  SNP  AIC  Log P_{1}^{c}  Log P_{2}^{d}  AIC  Log P_{1}  Log P_{2} 
τ _{1}  rs6743961  34355.67  0.03  0.12  34356.60  0.00  0.01 
τ _{2}  rs17714718  34365.19  5.42  6.15  34365.66  4.88  5.63 
τ _{3}  rs1894638  34352.03  0.23  0.04  34352.82  0.24  0.06 
τ _{4}  rs1919811  34354.41  0.02  0.11  34355.51  0.47  0.07 
τ _{5}  rs213952  34372.39  >16  0.21  34373.04  >16  0.29 
Association tests of τ_{2} and τ_{5} with the first replication of the simulated GAW16 data using longitudinal and crosssectional data
3 Visits  Visit 1  Visit 2  Visit 3  

Major gene  SNP  Log P_{1}^{a}  Log P_{2}^{b}  Log P_{1}  Log P_{2}  Log P_{1}  Log P_{2}  Log P_{1}  Log P_{2} 
τ _{2}  rs17714718  5.42  6.15  1.38  1.92  3.28  3.93  1.70  2.28 
τ _{5}  rs213952  >16  0.21  6.86  0.34  6.58  0.58  11.79  1.41 
Discussion
In the two linear mixedeffects models, familial correlations were assumed to have a compound symmetric structure, which is appropriate for pedigrees with firstdegree relative pairs only. For pedigrees with more general relationships, such models may not be accurate and can be further generalized by introducing kinship matrix. Comparing the two linear mixedeffects models with Kronecker and hierarchical structures, the first implicitly treats each family as a single subject and phenotypes are repeatedly measured across time as well as across family members. In the threelevel mixedeffects model, individuals are the subjects and measurements are repeated only across time while individuals are nested in the familial clusters.
Results from the longitudinal analysis of the threevisit phenotype data were found to be more significant than those from crosssectional ones. Out of the five majorgene SNPs of CAC, association with rs17714718 (τ_{2}) was detected only when using the longitudinal data (pvalue = 7.05 × 10^{7}). SNP rs213952 (τ_{5}) were found to be significant with both longitudinal and crosssectional data, and the former yielded the most significant result (pvalue > 10^{16}). None of the other majorgene SNPs were found to be significant. According to the answer to GAW 16 Problem 3 [10], τ_{1} was simulated to display only a minimal main effect, τ_{2} displays a measurable additive main effect, and τ_{3} and τ_{4} were simulated to be purely epistatic SNPs. Hence τ_{1}, τ_{3}, and τ_{4} were not significant when testing of their main effects only.
QQ plots are widely used in genomewide association studies to measure credibility of the results. An inflated distribution of the test statistic or pvalue may indicate possible defects in statistical analysis, e.g., population substructure, loose genotype quality control, or inappropriate statistical method. One underlying assumption of the argument is that the number of true positives is small if there will be any, hence test statistics from a genomewide scan should represent an empirical distribution of test statistics under the null hypothesis. Inflation of the distribution should only happen in the tail due to the truepositive results, not for the whole distribution. In GAW16 Problem 3, no effect of population substructures was simulated in any phenotypes; inflated QQ plots, however, were still observed. For a complex trait like CAC, which has 17 majorgene SNPs involved directly or indirectly and 2000 polygenes spread all over the 22 chromosomes, such assumption may be worth examination. The effect sizes of all of these 2017 SNPs are so small that they may not be able to stand out at the tail of the distribution. On the other hand, the SNPs are true signals, and hence are more likely to have smaller pvalues than other null markers. Considering further that each of these SNPs may be in linkage disequilibrium with some other SNPs, it is possible that empirical distributions may not align to their nominal distributions simply due to the complexity of the phenotype itself. In a recent genomewide association study of dyslipidemia [13] in which population structure was adjusted using principal components, inflated QQ plots were still shown near the tail (see Figure 1 of the paper). As the title suggested, dyslipidemia is indeed a polygenic trait.
CAC is probably the most complex phenotype in the simulated GAW16 data set. All the major genes and polygenes of highdensity lipoprotein, lowdensity lipoprotein, and triacylglyceride affect this phenotype simultaneously. Medication, diet, genegene, and geneage interactions make it even more complex. In the tests of linear and quadratic geneage interactions, no interactions were detected for any major or polygenes at a Bonferronicorrected significance level. This may due to small marginal effects of those major gene SNPs. Also, some major gene SNPs have pure epistatic effects only, which could significantly reduce the power of test of genenotypeage interactions. In addition, because the geneage interaction was simulated in a piecewise linear fashion [10], this suggests that simple tests of linear or quadratic geneage interactions may not be adequate. A function that reflects the underlying interaction form more accurately may be needed.
Conclusion
We applied two linear mixedeffects models to analyze longitudinal family data provided by GAW16 Problem 3. The models with Kronecker and hierarchical structures yielded comparative performance in terms of goodness of fit and significance of theirs results. Longitudinal test that jointly analyze repeated phenotypes were found to be much more powerful than tests based on crosssectional data only. Complexity of the trait itself could be a reason for inflated distribution on QQ plot, and using the correct null phenotype is suggested when evaluating distribution of test statistic under the null hypothesis.
List of abbreviations used
 AICs:

Akaike information criterion
 BMI:

Body mass index
 CAC:

Coronary artery calcification
 GAW:

Genetic Analysis Workshop
 QQ:

Quantilequantile
 ROC:

Receiver operating characteristic
 SNP:

Singlenucleotide polymorphism.
Declarations
Acknowledgements
This study was supported by a grant from the National Institute of General Medical Sciences, GM 28719, and a grant from the National Heart, Lung, and Blood Institute, HL 054473. The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences.
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/17536561/3?issue=S7.
Authors’ Affiliations
References
 Rao DC, MacLean CJ, Morton NE, Yee S: Analysis of family resemblance. V. Height and weight in northeastern Brazil. Am J Hum Genet. 1975, 27: 509520.PubMed CentralPubMedGoogle Scholar
 Province MA, Rao DC: A new model for the resolution of cultural and biological inheritance in the presence of temporal trends: application to systolic blood pressure. Genet Epidemiol. 1985, 2: 363374. 10.1002/gepi.1370020405.View ArticlePubMedGoogle Scholar
 Province MA, Tishler P, Rao DC: Repeatedmeasures model for the investigation of temporal trends using longitudinal family studies: application to systolic blood pressure. Genet Epidemiol. 1989, 6: 333347. 10.1002/gepi.1370060204.View ArticlePubMedGoogle Scholar
 Jarvik GP, Goode EL, Austin MA, Auwerx J, Deeb S, Schellenberg GD, Reed T: Evidence that the apolipoprotein Egenotype effects on lipid levels can change with age in males: a longitudinal analysis. Am J Hum Genet. 1997, 61: 171181. 10.1086/513902.PubMed CentralView ArticlePubMedGoogle Scholar
 LaskySu J, Lyon HN, Emilsson V, Heid IM, Molony C, Raby BA, Lazarus R, Klanderman B, SotoQuiros ME, Avila L, Silverman EK, Thorleifsson G, Thorsteinsdottir U, Kronenberg F, Vollmert C, Illig T, Fox CS, Levy D, Laird N, Ding X, McQueen MB, Butler J, Ardlie K, Papoutsakis C, Dedoussis G, O'Donnell CJ, Wichmann HE, Celedon JC, Schadt E, Hirschhorn J, Weiss ST, Stefansson K, Lange C: On the replication of genetic associations: timing can be everything!. Am J Hum Genet. 2008, 82: 849858. 10.1016/j.ajhg.2008.01.018.PubMed CentralView ArticlePubMedGoogle Scholar
 Laird NM, Ware JH: Randomeffects models for longitudinal data. Biometrics. 1982, 38: 963974. 10.2307/2529876.View ArticlePubMedGoogle Scholar
 Kraja AT, Corbett J, Ping A, Lin RS, Jacobsen PA, Crosswhite M, Borecki IB, Province MA: Rheumatoid arthritis, item response theory, Blom transformation, and mixed models. BMC Proc. 2007, 1 (suppl 1): S11610.1186/175365611s1s116.PubMed CentralView ArticlePubMedGoogle Scholar
 Budde LR: Methods to detect SNP associations with family data: a comparative analysis [Masters thesis]. 2005, Washington University in St. Louis School of MedicineGoogle Scholar
 Spielman R, Ewens W: A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. Am J Hum Genet. 1998, 62: 450458. 10.1086/301714.PubMed CentralView ArticlePubMedGoogle Scholar
 Kraja AT, Culverhouse R, Daw EW, Wu J, Van Brunt A, Province MA, Borecki IB: The Genetic Analysis Workshop 16 Problem 3: simulation of heritable longitudinal cardiovascular phenotypes based on actual genomewide singlenucleotide polymorphisms in the Framingham Heart Study. BMC Proc. 2009, 3 (suppl 7): S410.1186/175365613s7s4.PubMed CentralView ArticlePubMedGoogle Scholar
 Longford NT: A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects. Biometrika. 1987, 74: 817827. 10.1093/biomet/74.4.817.View ArticleGoogle Scholar
 Singer JD: Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. J Educ Behav Stat. 1998, 23: 323355.View ArticleGoogle Scholar
 Kathiresan S, Willer CJ, Peloso GM, Demissie S, Musunuru K, Schadt EE, Kaplan L, Bennett D, Li Y, Tanaka T, Voight BF, Bonnycastle LL, Jackson AU, Crawford G, Surti A, Guiducci C, Burtt NP, Parish S, Clarke R, Zelenika D, Kubalanza KA, Morken MA, Scott LJ, Stringham HM, Galan P, Swift AJ, Kuusisto J, Bergman RN, Sundvall J, Laakso M, Ferrucci L, Scheet P, Sanna S, Uda M, Yang Q, Lunetta KL, Dupuis J, de Bakker PI, O'Donnell CJ, Chambers JC, Kooner JS, Hercberg S, Meneton P, Lakatta EG, Scuteri A, Schlessinger D, Tuomilehto J, Collins FS, Groop L, Altshuler D, Collins R, Lathrop GM, Melander O, Salomaa V, Peltonen L, OrhoMelander M, Ordovas JM, Boehnke M, Abecasis GR, Mohlke KL, Cupples LA: Common variants at 30 loci contribute to polygenic dyslipidemia. Nat Genet. 2009, 41: 5665. 10.1038/ng.291.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.