Volume 3 Supplement 7
Genetic Analysis Workshop 16
A genomewide association analysis of Framingham Heart Study longitudinal data using multivariate adaptive splines
 Wensheng Zhu^{1},
 Kelly Cho^{1},
 Xiang Chen^{1},
 Meizhuo Zhang^{1},
 Minghui Wang^{1} and
 Heping Zhang^{1}Email author
DOI: 10.1186/175365613S7S119
© Zhu et al; licensee BioMed Central Ltd. 2009
Published: 15 December 2009
Abstract
The Framingham Heart Study is a well known longitudinal cohort study. In recent years, the communitybased Framingham Heart Study has embarked on genomewide association studies. In this paper, we present a Framingham Heart Study genomewide analysis for fasting triglycerides trait in the Genetic Analysis Workshop16 Problem 2 using multivariate adaptive splines for the analysis of longitudinal data (MASAL). With MASAL, we are able to perform analysis of genomewide data with longitudinal phenotypes and covariates, making it possible to identify genes, genegene, and geneenvironment (including time) interactions associated with the trait of interest. We conducted a permutation test to assess the associations between MASAL selected markers and triglycerides trait and report significant genegene and geneenvironment interaction effects on the trait of interest.
Background
Current advances in genotyping technologies, such as the Affymetrix 500 k GeneChip, make genomewide association studies (GWAS) feasible for identifying common variants that underlie complex traits. Some of the recent genetic variants discovered by GWAS include agerelated macular degeneration (AMD) [1, 2], inflammatory bowel disease [3], and electrocardiographic QT interval [4]. Data from the 500 k genomewide scan of the Framingham Heart Study (FHS) is available for use in the Genetic Analysis Workshop (GAW) 16. The FHS, a communitybased cohort study initiated in 1948, aims to identify cardiovascular disease risk factors. FHS provides a collection of data from three generation families who had been followed up every 2 or 4 years over time. This longitudinal feature poses methodological challenges. Applying an efficient approach to analyzing the FHS longitudinal data may help in discovering new genetic variants in GWAS.
Previously, several approaches [5, 6] were proposed to analyze the FHS 100 k data set; however, most of these did not directly deal with longitudinal data. These methods require the longitudinal measures to be summarized into one timepoint trait by taking the average of several measures or by using the familybased association test (FBAT) principalcomponents method [6]. It is inevitable that there may be some loss of information by using the summary trait values [7]. Furthermore, when applying the adjustment of FBATprincipalcomponents method in GWAS, it is difficult to include environment factors such as sex and age.
In our study, we use the multivariate adaptive splines for analysis of longitudinal data (MASAL) presented by Zhang [8] to analyze the FHS longitudinal data. MASAL is a nonparametric regression approach that was developed specifically to handle longitudinal data. MASAL not only accommodates timevarying covariates, but also allows interactions between gene and environmental factors and between time and covariates [9]. Here we demonstrate and apply MASAL to identify genes, genegene, and geneenvironment interactions in relation to the trait triglyceride (TG) level in GWAS using FHS data in GAW16 Problem 2.
Methods
MASAL
where f is an unknown smooth function and ε_{ ij }is the error term.
where β_{ m }is the regression coefficient and B_{ m }(x) is a special basis function of the p + 1 covariates x = (x_{1}, ..., x_{p+1}) (m = 1, ..., M), and M is the number of terms. Specifically, B_{ m }(x) is either one of (x_{ k } τ)^{+} and x_{ k }or their product (k = 1, ..., p + 1), and a^{+} = max(a,0) for any number a and τ is called a knot.
In the forward step, terms are added to minimize the (weighted) sum of squared residuals: , where and is the predicted value of y_{ i }, and W_{ i }is the withinsubject covariance matrix for , i = 1, ..., n. After the forward step, all knots are found and each corresponding basis function will be treated as if it is a given predictor. In the backward step, based on generalized crossvalidation (GCV), we delete one least significant term from the large model at a time. The final model we select is the one that yields the smallest , where WLS_{ l }is the WLS of a reduced model with l terms and λ (usually λ = 4 [10, 11]) is the penalizing parameter for model complexity.
GWA analyses with MASAL
where represents terms containing any genetic component (i.e., singlenucleotide polymorphism (SNP), SNPSNP interaction, or SNPcovariate interaction), refers to nongenetic covariate terms, and , is the estimate of the corresponding regression coefficients.
where , and is the estimated covariance matrix of . We use a permutation procedure to establish the null distribution of W The permutation test is done by randomly assigning the phenotype while keeping the set of genotypes intact for each individual and then performing the GWA analysis using MASAL. It is noteworthy that nongenetic covariates go together with the phenotype.
Study design
We perform GWA analyses of TG trait with MASAL. We consider the genotype at every SNP as a covariate in the model in addition to sex and age variables. MASAL has the option of setting the maximum order of interactions in the model. We set it to three in our analyses because it is difficult to interpret interactions higher than the third order. We first use MASAL to perform GWA analyses in the Offspring Cohort, in which the repeated TG values and the familial correlations are properly accounted for in the analysis. Next, we perform GWA analyses with MASAL in the Original Cohort in which the subjects are considered to be independent, whereas the longitudinal trait values are considered to be correlated. Then, we examine significant SNPs, SNPSNP, and SNPcovariate interactions in the two generation data sets analyzed and compare the level of concordance of significant associations in the two samples.
In the Offspring Cohort, some pedigrees have more than 100 subjects, which cannot be treated as independent individuals. These families with repeated traits induce a large covariance matrix in MASAL and thus markedly limit its efficiency. Therefore, in the analyses of Offspring Cohort, we split all pedigrees into sibship units according to the information from the Original Cohort. We obtained 1,767 sibship units, for which each sibship unit consists of one set of siblings and their spouses from each nuclear family. All subjects included in our study have all TG trait values (Exams 1, 3, 5, and 7) and genotypes. In the Original Cohort, we used 146 individuals who have all TG trait values (Exams 7 and 11) and genotypes. All of these subjects were genotyped for 488,146 SNPs.
Results and discussion
We applied MASAL to analyze the FHS 500 k SNP data set (GAW 16 Problem 2). Before the analysis, TG level values were logtransformed to approximate a normal distribution, although there is no such limitation when using MASAL. Furthermore, in order to minimize falsepositive associations due to rarer SNPs and genotyping artifact, we limited our analyses to SNPs with minor allele frequency ≥ 10% and the pvalue for testing HardyWeinberg equilibrium <0.001. Thus, there were a total of 294,434 SNPs remaining in our analysis.
and the value of the corresponding Wald statistic is 359.575.
Significant SNPs selected by MASAL
Data set  SNP  Locus  Nearest gene(s) 

Offspring Cohort  rs4367528  8q12  RLBP1L1 
rs16860145  3q13.2  CD200R2  
rs4074863  10q26  FLJ46300, TCERG1L  
rs9828013  3p25  WNT7A  
rs41442345  4q23  BANK1  
rs11150610  16p11.2  ITGAM  
rs5015152  3q26.3  NLGN1  
rs17117113  5q33  KIF4B  
rs1361536  9q21  KRT18P24, CHCHD9  
rs17630545  8q23  CSMD3  
rs7204454  16q23  CDH13  
rs1321130  1q42  FAM89A, FLJ30430  
rs2514930  11q21  NAALAD2  
Original Cohort  rs6835031  4q22  TIGD2 
rs4984982  16p13.3  LMF1  
rs16995794  20q13.2  RPSAP1  
rs17783132  14q24  BATF  
rs11688196  2p23  TRNALAAG  
rs9643584  8q13  CPA6 
Conclusion
In this report, we proposed a testing procedure to perform GWAS for longitudinal data, using a nonparametric regression method (MASAL) presented by Zhang [8]. In contrast to other GWA methods, our testing procedure has two novel features. First, it can handle longitudinal data without combining longitudinal measures into a onetimepoint measure in GWAS. Second, it can accommodate genegene, geneenvironment, and timecovariate interactions in GWAS. Using MASAL, we analyzed the FHS 500 k genotype data (GAW 16 Problem 2) using TG as the trait of interest and found some significant genegene and geneenvironment interaction effects on TG trait. These results indicated that MASAL is useful for exploring genegene and geneenvironment interactions in the GWAS of longitudinal data.
We used a permutation procedure to establish the null distribution of the Wald statistic and then estimated the significance level. However, the computation time was lengthy, especially for the large pedigree and large number of exams of each subject. Theoretical studies exploring the asymptotic distribution of the involved statistic would be useful.
List of abbreviations used
 AMD:

Agerelated macular degeneration
 FBAT:

Familybased association test
 FHS:

Framingham Heart Study
 GAW:

Genetic Analysis Workshop
 GCV:

Generalized crossvalidation
 GWAS:

Genomewide association studies
 MASAL:

Multivariate adaptive splines for the analysis of longitudinal data
 SNP:

Singlenucleotide polymorphism
 TG:

Triglyceride.
Declarations
Acknowledgements
This research is supported in part by grants K02DA017713, R01DA016750, and T32 MH014235 from the National Institutes of Health. The Framingham Heart Study project is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University (N01 HC25195). The Genetic Analysis Workshop is supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. The simulated data was supported by the Washington University Institute of Clinical and Translational Sciences, NIH grant 1U54RR023496. The GAW16 Framingham and simulated data used for the analyses described in this manuscript were obtained through dbGaP (accession numbers). The authors acknowledge the investigators that contributed the phenotype, genotype, and simulated data for this study. This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or the NHLBI.
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/17536561/3?issue=S7.
Authors’ Affiliations
References
 Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott J, Barnstable C, Hoh J: Complement factor H polymorphism in agerelated macular degeneration. Science. 2005, 308: 385389. 10.1126/science.1109557.PubMed CentralView ArticlePubMedGoogle Scholar
 Chen X, Liu CT, Zhang MZ, Zhang HP: A forestbased approach to identifying gene and genegene interactions. Proc Natl Acad Sci USA. 2007, 104: 1919919203. 10.1073/pnas.0709868104.PubMed CentralView ArticlePubMedGoogle Scholar
 Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, Steinhart AH, Abraham C, Regueiro M, Griffiths A, Dassopoulos T, Bitton A, Yang H, Targan S, Datta LW, Kistner EO, Schumm LP, Lee AT, Gregersen PK, Barmada MM, Rotter JI, Nicolae DL, Cho JH: A genomewide association study identifies IL23R as an inflammatory bowel disease gene. Science. 2006, 314: 14611463. 10.1126/science.1135245.PubMed CentralView ArticlePubMedGoogle Scholar
 Arking DE, Pfeufer A, Post W, Kao WH, NewtonCheh C, Ikeda M, West K, Kashuk C, Akyol M, Perz S, Jalilzadeh S, Illig T, Gieger C, Guo CY, Larson MG, Wichmann HE, Marbán E, O'Donnell CJ, Hirschhorn JN, Kääb S, Spooner PM, Meitinger T, Chakravarti A: A common genetic variant in the NOS1 regulator NOS1AP modulates cardiac repolarization. Nat Genet. 2006, 38: 644651. 10.1038/ng1790.View ArticlePubMedGoogle Scholar
 Kathiresan S, Manning AK, Demissie S, D'Agostino RB, Surti A, Guiducci C, Gianniny L, Burtt NP, Melander O, OrhoMelander M, Arnett DK, Peloso GM, Ordovas JM, Cupples LA: A genomewide association study for blood lipid phenotypes in the Framingham Heart Study. BMC Med Genet. 2007, 8 (suppl 1): S1710.1186/147123508S1S17.PubMed CentralView ArticlePubMedGoogle Scholar
 IonitaLaza I, McQueen MB, Laird NM, Lang C: Genomewide weighted hypothesis testing in familybased association studies, with an application to a 100 K scan. Am J Hum Genet. 2007, 81: 607614. 10.1086/519748.PubMed CentralView ArticlePubMedGoogle Scholar
 Zhang HP, Zhong X: Linkage analysis of longitudinal data and design consideration. BMC Genet. 2006, 7: 3710.1186/14712156737.PubMed CentralView ArticlePubMedGoogle Scholar
 Zhang HP: Multivariate adaptive splines for analysis of longitudinal data. J Comput Graph Stat. 1997, 6: 7491. 10.2307/1390725.Google Scholar
 Zhang HP: Mixed effects multivariate adaptive splines model for the analysis of longitudinal and growth curve data. Stat Methods Med Res. 2004, 13: 6382. 10.1191/0962280204sm353ra.View ArticlePubMedGoogle Scholar
 Zhang HP: Analysis of infant growth curves using multivariate adaptive splines. Biometrics. 1999, 55: 452459. 10.1111/j.0006341X.1999.00452.x.View ArticlePubMedGoogle Scholar
 Friedman JH: Multivariate adaptive regression splines. Ann Stat. 1991, 19: 1141. 10.1214/aos/1176347963.View ArticleGoogle Scholar
 Castelli WP: Epidemiology of triglycerides: a view from Framingham. Am J Cardiol. 1992, 70: 3H9H. 10.1016/00029149(92)91083G.View ArticlePubMedGoogle Scholar
 Kooner JS, Chambers JC, AguilarSalinas CA, Hinds DA, Hyde CL, Warnes GR, Gómez Pérez FJ, Frazer KA, Elliott P, Scott J, Milos PM, Cox DR, Thompson JF: Genomewide scan identifies variation in MLXIPL associated with plasma triglycerides. Nat Genet. 2008, 40: 149151. 10.1038/ng.2007.61.View ArticlePubMedGoogle Scholar
 Kathiresan S, Melander O, Guiducci C, Surti A, Burtt NP, Rieder MJ, Cooper GM, Roos C, Voight BF, Havulinna AS, Wahlstrand B, Hedner T, Corella D, Tai ES, Ordovas JM, Berglund G, Vartiainen E, Jousilahti P, Hedblad B, Taskinen MR, NewtonCheh C, Salomaa V, Peltonen L, Groop L, Altshuler DM, OrhoMelander M: Six new loci associated with blood lowdensity lipoprotein cholesterol, highdensity lipoprotein cholesterol or triglycerides in humans. Nat Genet. 2008, 40: 189197. 10.1038/ng.75.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.