Volume 3 Supplement 7
Genetic Analysis Workshop 16
Tests for candidategene interaction for longitudinal quantitative traits measured in a large cohort
 Dörthe Malzahn^{1}Email author,
 Yesilda Balavarca^{1},
 Jingky P Lozano^{1} and
 Heike Bickeböller^{1}
DOI: 10.1186/175365613S7S80
© Malzahn et al; licensee BioMed Central Ltd. 2009
Published: 15 December 2009
Abstract
For the Framingham Heart Study (FHS) and simulated FHS (FHSsim) data, we tested for genegene interaction in quantitative traits employing a longitudinal nonparametric association test (LNPT) and, for comparison, a survival analysis. We report results for the Offspring Cohort by LNPT analysis and on all longitudinal cohorts by survival analysis with cohort effect adjustment. We verified that type I errors were not inflated. We compared the power of both methods to detect in FHSsim data two sets of gene pairs that interact for the trait coronary artery calcification. In FHS, we tested eight gene pairs from a list of candidate genes for interaction effects on body mass index. Both methods found evidence for pairwise nonadditive effects of mutations in the genes FTO, PON1, and PFKP on body mass index.
Background
The Framingham Heart Study (FHS) cohorts and simulated FHS (FHSsim) data available for Genetic Analysis Workshop 16 (GAW16) provide longitudinal data on quantitative traits and include families. FHSsim data are based on the FHS individuals and family structures, replacing only the phenotypes by simulated values. For both sets of data, we compared two association approaches to test quantitative traits for genegene interaction. Both approaches use baseline and available followup measurements and require a set of independent individuals with phenotypes and genotypes. The methods are: 1) a longitudinal nonparametric association test for cohorts (LNPT) which we recently developed and 2) survival analysis with the Cox model. The LNPT is a ranksum procedure based on the longitudinal trait measurements. Survival analysis models event times derived from the longitudinal traits as age at the first exam when the trait crossed a predefined threshold.
Both approaches remain mathematically valid regardless of trait distribution. This is most useful: quantities with mixture distributions such as the simulated trait coronary artery calcification (CAC) remain nonnormal despite blom or log transformation. Also, LNPT and survival analysis are invariant with respect to monotone transformations such as blom or log: they change neither the order (ranks) of the trait data nor the event times, given that the threshold is also transformed.
We analyzed quantitative traits with a skewed distribution: CAC in the FHSsim data and body mass index (BMI) in the FHS data. We verified that the type I error of 5% was kept, applying our methods on the original, untransformed data. CAC is influenced by two separate sets of interacting gene pairs according to the GAW16 answers [1]. We computed the power of the two approaches to detect twolocus interactions for the two interacting gene pairs at the α = 5% level in all 200 data replicates. In FHS, we analyzed BMI for genegene interactions based on a set of candidate genes that carry susceptibility variants that have been identified in previous singlelocus studies. We report results for the offspring cohort by LNPT analysis, and on all longitudinal cohorts by survival analysis with cohort effect adjustment. GAW16 data retrieval and all analyses were carried out in compliance with the Helsinki Declaration.
Materials
The FHS data provide 227 independent individuals and 980 pedigrees. For association analysis, 1631 (233 Original (OC), 1243 Offspring (Off), and 155 Third Generation (Gen3) Cohort) unrelated individuals had available genotype and phenotype data that could be extracted, including founders and one random individual from all remaining multimember pedigrees. Similar selection procedures were applied to FHSsim.
For FHS, Gen3 provides baseline values only. OC and Off provide four consecutive measurements per individual. The LNPT requires that individuals are followed up at the same intervals but tolerates missing trait values. We harmonized followup times by allocating five examination time points T = 1, ..., 5 (baseline, 6 yr, 12 yr, 19 yr, 26 yr after baseline). T = 2 is missing for Off, T = 5 is missing for OC. Longitudinal analysis by LNPT used the phenotypes from all time points when both cohorts were examined (T = 1, 3, 4). Some individuals have incomplete longitudinal phenotypes. For FHSsim, each cohort (Gen3, OC, Off) provides simulated baseline trait values and two followups, each 10 years apart. All longitudinal phenotypes are complete.
For survival analysis, time to event was defined as the earliest age at which the individual presented a BMI ≥ 25 kg/m^{2} (FHS) or a value of CAC>230 (FHSsim, no unit). For censored individuals, time to censoring was age at the last available exam. Censoring occurred when individuals did not pass the threshold either for any of the longitudinal measurements or for an initial sequence of longitudinal measurements after which they are missing to follow up. We excluded individuals who have baseline trait values which are either missing or above the threshold.
For analysis of CAC (FHSsim), we selected the two interacting SNP pairs from the GAW answers [1] (singlenucleotide polymorphisms (SNPs) τ1 and τ2; τ3 and τ4). τ1 displays only a minimal main effect, τ2 a measurable additive main effect. τ3 and τ4 are epistatic SNPs. All four SNPs have minor allele frequencies (MAF) of approximately 0.5.
For genegene interaction analysis of BMI (FHS), we selected five candidate SNPs previously found to be associated with alteration of BMI [2]. The SNPs are rs6602024 (MAF 11%, chromosome 10, gene PFKP); rs1121980 and rs9930506 (both: MAF 44%, chromosome 16, gene FTO); and rs854560 (MAF 37%, chromosome 7, gene PON1) and rs6971091 (MAF 22%, chromosome 7, gene FAM137A).
Methods
LNPT analysis
where C is a contrast matrix and F = { } is the set of distribution functions ordered by observational time point t and the influencing factors (kls) of interest. For genegene interaction analysis, two factors k, l are used to code for SNP genotype at the two loci (k, l = 0,1,2 for biallelic loci) and we stratified by sex s = {m, w}. The test statistic resembles structurally a heteroscedastic repeated measures ANOVA which is performed on the midranks of the longitudinal traits [3], estimating longitudinal covariance from the ranks without assuming any structure. Trait vectors Y_{ i }can be arbitrarily dependent between exams/time points t of the same individual i while they are assumed to be independent between individuals. Individuals should be followed up at the same time intervals. The LNPT yields a set of adjusted pvalues for tests of average effects of the loci, sex, and number of exam and for tests of all interactions. The LNPT can handle missing values for the longitudinal phenotype. The test statistic is computed using the existing phenotype measurements when values are missing. The LNPT can model covariates as additional factors. For all tested interactions, each stratum must contain a minimum of ten observations per exam to ensure consistent estimates of the test statistic.
Prior to genetic analysis, we tested for a cohort effect (FHS: OC, Off; FHSsim: OC, Off, Gen3) by using factors for cohort type and for sex, omitting the two genetic factors k, l. Genegene interaction analysis was restricted to the large Offspring sample (N = 1243), to individuals with no cholesterol treatment (79% of Off sample). For BMI (FHS), we restricted baseline age to the interval 2546 yr to avoid effects of BMIrelated mortality (manifest as stagnating median BMI for old age). We adjusted for age by using a twolevel factor, testing only ageaveraged interaction effects. Levels are assigned according to baseline age with median baseline age as break point.
Survival analysis: Cox proportional hazards model and timevarying covariates
Survival analysis evaluates the time to the occurrence of an event of interest and its dependence on particular characteristics such as sex, genotype, and cohort. We employ the Cox proportional hazard model and its extension for timevarying factors [4], which are smoking status and cholesterol treatment. The extended Cox model compares the risk of an event between two or more subgroups at each event time, where the risk group of an individual changes according to the timevarying factor. The model was adjusted for a cohort effect using all cohorts with followup measurements (FHS: OC, Off; FHSsim: OC, Off, Gen3). The likelihoodratio test was used to test for the genegene interaction by comparing the loglikelihood of the model with interaction against the null hypothesis model of no interaction.
Results
A significant main effect of cohort type and a significant interaction of cohort type with sex were detected in FHS BMI data (LNPT) and FHSsim CAC data (LNPT and Cox model). We report genegene interaction analysis on the Offspring Cohort (LNPT) and on all longitudinal cohorts (Cox model with cohort effect adjustment).
Rates of false positives (i.e., type I errors) for pvalues ≤ 0.05 were estimated for the interaction tests for SNP pairs rs854560 and rs1121980 (FHS) and τ1, τ2 (FHSsim) after permutation of the assignment between longitudinal phenotypes and individuals. Permutation destroys associations but retains the distributional properties of the trait.
FHSsim: results for CAC
The tested SNP interactions τ_{1}, τ_{2} and τ_{3}, τ_{4} are known to contribute to the analyzed trait CAC [1]. For each SNP pair, power is estimated as percentage of significant results (p ≤ 0.05) of the interaction test on the 200 replicates. The LNPT detected both interactions with 100% power in the largest ascertainable sample (n = 856 on average) and with 99% (τ_{1}, τ_{2}) and 100% (τ_{3}, τ_{4}) for sample size n = 400. The Cox model had a power of 94% (τ_{1}, τ_{2}) and 100% (τ_{3}, τ_{4}) for the largest ascertainable sample (n = 808 on average, 49% events) but only 76% (τ_{1}, τ_{2}) and 97% (τ_{3}, τ_{4}) for sample size n = 400. The Cox model did benefit from cohort effect adjustment: SNP interaction τ_{1}, τ_{2} was found with 94% power after adjustment instead of 69% with no adjustment (n = 808). For CAC, the LNPT is more powerful compared with survival analysis. Estimates of type I error at the α = 5% level were 4.76% (LNPT) and 4.90% (Cox).
FHS: results for BMI
A significant cohort effect was detected by LNPT (p_{cohort} = 0.0001, p_{cohort × sex} = 0.03). It is due to men (LNPT, stratified analysis). The Cox model yields thresholddependent results (event BMI>25: p_{cohort} = 0.15, p_{cohort × sex} = 0.13, event BMI>30: p_{cohort} = 0.02, p_{cohort × sex} = 0.80) detecting no interaction between cohort and sex. In the remainder of this paper, we report results on the Cox model with threshold BMI>25 (yielding more events) and cohorteffect adjustment.
Genegene interaction for BMI in FHS data^{a}
LNPT analysis of independent individuals in Offspring Cohort^{b} (n= 824) for the trait Longitudinal BMI  Survival analysis of independent individuals in Original and Offspring Cohorts (n= 858) for the trait Event Overweight (BMI ≥ 25)  

Modelled factors  Sex and two gene factors  Sex, cohort, and two gene factors  
Age adjustment  No  Yes  Age is event time  
Timevarying covariates  No^{b}  No^{b}  No  Yes^{h} 
SNP pair  
rs854560 and rs6602024^{ce}  
analyzing exams (T = 1, 4)^{d}  0.022 ^{ g }  0.040 ^{ g }  0.003  0.007 
analyzing exams (T = 1,3,4)^{d}  0.092^{g}  n.s.  0.003  0.007 
rs854560 and rs1121980  0.034 ^{ f }  0.003 ^{ f }  0.030  0.053 
rs854560 and rs9930506  0.048 ^{ f }  0.004 ^{ f }  0.005  0.008 
rs6971091^{c} and rs6602024^{c, e}  n.s.  n.s.  n.s.  n.s. 
rs6971091^{c} and rs1121980  n.s.  n.s.  n.s.  n.s. 
rs6971091^{c} and rs9930506  n.s.  n.s.  n.s.  n.s. 
rs6602024^{c, e}and rs1121980  0.053^{g}  0.049 ^{ g }  0.020  0.028 
rs6602024^{c, e}and rs9930506  n.s.  n.s.  0.077  n.s. 
Discussion
Transformation from followup data to eventtime data can be viewed as reduction of information content. It requires a priori a well reasoned threshold value. Results of survival analysis depend on the choice of this threshold. We illustrated this for the detection of the cohort effect (BMI, FHS). For some scenarios, it may well be possible that longitudinal data can not be analyzed by survival analysis. For example, if a particular twolocus genotype would efficiently protect against overweight, one could face the problem that the threshold either provides insufficient contrast or does not yield sufficient numbers of events for analysis for the twolocus genotypes of interest. For BMI, an established threshold value of 25 kg/m^{2} for the status overweight was available. In contrast to CAC, distribution of BMI is almost normal (only slightly skewed) and candidate SNPs for BMI tend to be rare, yielding small twolocus genotype subgroups. For the SNPpair with smallest subgroup size (rs854560 and rs6602024), the likelihoodratio test performed for survival analysis was observed to yield significance although simultaneously, the computed confidence intervals for the HR ratios were not significant but included very high HRs.
Survival analysis has the benefit that it yields hazard ratios as readily interpretable estimates of effect size. The LNPT tests for a much more general feature, namely differences between whole trait distributions without making assumptions about an underlying model. Consequently, it primarily provides a tool for testing for group differences but does not offer readily interpretable estimates of effect size. When a set of factors are chosen, by default the LNPT also tests their interactions, with the consequence that interactions with weak marginal effects are less likely to be missed.
Conclusion
CAC has a strongly skewed distribution. For parametric approaches, this would cause loss of power or inflated rates of false positives. Neither is the case for the LNPT and survival analysis. We have shown that they both keep the rate of false positives at the 5% level while they successfully detected genegene interactions for simulated and real phenotypes with good power. The LNPT was found to be more powerful compared with survival analysis for detection of a cohort effect in BMI (FHS) and more powerful for detection of genegene interaction, particularly for CAC (FHSsim). The LNPT is a longitudinal approach. In its ability to incorporate loss to follow up, the LNPT is better than survival analysis: the LNPT uses all remaining trait measurements, whereas for survival analysis these individuals are likely to become censored or even must be excluded in case of missing baseline value.
For highdimensional data such as genomewide studies, sophisticated boosting techniques exist for survival analysis [5]. They yield sparse models with high explanatory power. For the LNPT, data partition techniques similar to tree approaches or multifactor dimensionality reduction could be used. Further research is needed on this issue.
List of abbreviations used
 BMI:

Body mass index
 CAC:

Coronary artery calcification
 FHS:

Framingham Heart Study
 FHSsim:

Framingham Heart Study simulated data
 GAW16:

Genetic Analysis Workshop 16
 Gen3:

Third Generation Cohort
 HR:

Hazard ratio
 LNPT:

Longitudinal nonparametric association test
 MAF:

Minorallele frequency
 OC:

Original Cohort
 Off:

Offspring Cohort
 SNP:

Singlenucleotide polymorphism
Declarations
Acknowledgements
The Framingham Heart Study project is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University (N01 HC25195). The simulated data was supported by the Washington University Institute of Clinical and Translational Sciences, NIH grant 1U54RR023496. The GAW16 Framingham and simulated data used for the analyses described in this manuscript were obtained through dbGaP (accession number phs000128.v1.p1). The authors acknowledge the investigators that contributed the phenotype, genotype, and simulated data for this study. This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or the NHLBI. The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. This work was supported by the German Federal Ministry of Education and Research BMBF, German National Genome Research Net NGFN2 and NGFNplus (grants 01GR0462, 01GS0422, 01GS0837), and by the German Research Society DFG (grant GRK1034) and by the EU (grant MRTNCT2004512253). We thank the referees for their helpful remarks.
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/17536561/3?issue=S7.
Authors’ Affiliations
References
 Kraja AT, Culverhouse R, Daw EW, Wu J, Van Brunt A, Province MA, Borecki IB: The Genetic Analysis Workshop 16 Problem 3: simulation of heritable longitudinal cardiovascular phenotypes based on actual genomewide singlenucleotide polymorphisms in the Framingham Heart Study. BMC Proc. 2009, 3 (suppl 7): S410.1186/175365613s7s4.PubMed CentralView ArticlePubMedGoogle Scholar
 SNPedia. [http://www.snpedia.com]
 Brunner E, Domhof S, Langer F: Nonparametric Analysis of Longitudinal Data in Factorial Experiments. 2001, New York, WileyGoogle Scholar
 Therneau T, Grambsch P: Modeling Survival Data. Extending the Cox Model. 2000, New York, SpringerVerlagView ArticleGoogle Scholar
 Binder H, Schumacher M: Incorporating pathway information into boosting estimation of highdimensional risk prediction models. BMC Bioinformatics. 2009, 10: 1810.1186/147121051018.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.