- Open Access
Using a latent growth curve model for an integrative assessment of the effects of genetic and environmental factors on multiple phenotypes
BMC Proceedings volume 3, Article number: S44 (2009)
We propose the use of latent growth curve model to assess the influence of genetic, environmental, demographic, and lifestyle factors on multiple phenotypes related to coronary heart disease. We model four quantitative traits (systolic blood pressure, high-density lipoprotein, low-density lipoprotein, and triglycerides) simultaneously in a multivariate framework that allows us to study their change over time, assess individual variation, and investigate cross-phenotype relationships. Environmental, demographic, and lifestyle covariates are included at different levels of the model as time-varying or time-invariant, as appropriate. To investigate the change over time attributed to genetic factors, we use candidate markers that have previously been shown to be associated with the quantitative traits. We illustrate our approach using independent observations from the offspring cohort of the Framingham Heart Study data.
Numerous studies have identified environmental, demographic, and genetic factors that increase the risk of coronary heart disease (CHD). A notable major study that led to the identification of several risk factors for heart disease is the Framingham Heart Study (FHS), which began in 1948. The study provides measurements of major risk factors such as blood pressure and lipid levels taken over a long period of time, offering the opportunity to model developmental trajectories. Very recently, FHS genotyped individuals, which permits researchers to perform genome-wide association and/or linkage analyses to identify potential genetic factors that may influence the development of CHD.
Environmental and genetic variables influencing auantitative traits related to CHD such as systolic blood pressure have been studied extensively. Methods ranging from simple regression to more complicated multilevel models have been used to model the longitudinal aspects of blood pressure and other quantitative traits of interest . However, few studies looked at more than one phenotype simultaneously, and cross-phenotype relationships are not often investigated. In this paper, we consider longitudinal measurements taken from four different phenotypes known to be associated with CHD, namely: systolic blood pressure (SBP), low-density lipoprotein (LDL), high-density lipoprotein (HDL), and triglycerides (TG). We propose the use of latent growth curve (LGC) to simultaneously model these quantitative traits in a multivariate framework that allows us to investigate cross-phenotype correlations as well as to study the effect of environmental, genetic, and other covariates on the change of these phenotypes over time.
We included data from the FHS offspring cohort provided by Genetic Analysis Workshop 16 (GAW16). We restricted our analysis to independent members of the offspring cohort, which were selected as follows. Starting with the original 1538 families, the Generation 3 cohort was removed, which split the pedigrees into 3379 independent sub-pedigrees. The maximal set of independent samples was obtained, among the samples that belonged to the offspring cohort, consented to have their phenotype data used, and had genotype data, which resulted in 1488 individuals. An additional 171 samples without family data were added for a total of 1659 independent (kinship coefficient = 0) individuals. Among them, 221 individuals had one or more element of missing genotype information and were excluded. We considered time-varying covariates: smoking, hypertension, and cholesterol treatments. Other variables related to CHD including age, sex, body mass index (BMI), and diabetes status were included in the analysis as time-invariant covariates. Selected markers that have been previously identified to be linked and/or associated with the traits are included in the model to account for genetic contribution. The authors have adhered to the data use agreement for FHS data and this agreement has been reviewed and approved by the Research Ethics Board at the Research Institute, The Hospital for Sick Children, Toronto, Canada.
For the lipid traits, eight markers were selected from genes or gene regions that showed evidence for association with lipoprotein or lipid concentrations and were confirmed in a meta-analysis . Two of these eight markers were not present in either the 500 k or 50 k marker sets, and were also not in strong linkage disequilibrium with any marker. Marker rs11591147 (chromosome 1, in PCSK9) was replaced with rs11206510. Marker rs4420638 (chromosome 19, in the APOE-C1-C4-C2 gene cluster), which showed a weaker association in Willer et al. , was replaced with rs10402771. Similarly, rs1800775 was replaced with rs1150802. No genome-wide study has shown evidence of significant association with either blood pressure or hypertension. However, we include two markers with the smallest p-values from a genotypic test in the Wellcome Trust Case-Control Study . Information about the markers is provided in Table 1.
LGC modelling is used to study the effect of genetic and environmental factors on the change of SBP, HDL, LDL, and TG over time. One of the strengths of LGC modelling is that it allows us to study multiple outcomes over time in a multivariate framework, which is particularly useful in investigating the change in the levels of phenotypes simultaneously and assessing cross-phenotype relationships.
Suppose y pit is a measurement taken from individual i in pedigree p at exam t, where i = 1, 2, ..., n p , p = 1, 2, ..., k, t = 1, 2, ..., q, then the general growth curve model is described as,
where α pi and β pi are the intercept and the slope . Time-varying covariates such as v pit are included in the model at individual level as in Eq. (1), whereas time-invariant covariates such as w pi enter the model through the growth parameters (intercept and slope) as in Eq. (2). Covariates affecting the phenotypes at the pedigree level such as z p are included at the family (or pedigree) level as in Eq. (3). In our case, the measurements corresponding to y are SBP, HDL, LDL, and TG, and these four phenotypes are modelled simultaneously as parallel processes. Moreover, we do not have pedigree level parameters α p and β p because we considered unrelated individuals. We analyzed data using Mplus statistical software .
The path diagram given in Figure 1 describes the growth curve used in modeling the longitudinal measurements of SBP, HDL, LDL, and TG. Paths with one arrow represent casual relationships, whereas those with two arrows indicate correlations between the traits involved. For simplicity, we have not included all cross-trait relationships in the diagram; however, the results are provided in Tables 2 and 3. Considerable amount of variation in the intercepts are explained by the time-invariant variables sex, age, baseline BMI, and diabetes status (Table 2). For SBP and HDL, 35.6% and 33.6% of the variations in the intercepts, respectively, are explained by these covariates (Table 2). However, a significant amount of the variations (64.5%, p-value < 0.0001 and 66.4%, p-value <0.0001, for SBP and HDL, respectively) have not been accounted for. On the other hand, only a small amount of the variation in the slopes is explained by the time invariant covariates, where the largest explained variance is for LDL slopes (24.0%).
For the genetic factors, the results from our analysis are in agreement with previous association findings indicated in Table 1. We found strong associations between HDL and markers rs28927680 (p-value < 0.0001), rs1800588 (p-value = 0.002), and rs1150802 (p-value < 0.0001) (through the slope). A weak association between HDL slope and marker rs328 was also observed. Markers rs693 and rs10402271 are shown to be strongly associated with the intercept of LDL (both with p-value < 0.0001), whereas marker rs11206510 showed a weak association (p-value = 0.026). Markers rs28927680 and rs328 are also shown to be strongly associated with the intercept and slope of TG, respectively. For blood pressure, no marker was associated with the intercept of the model; however, a strong association between marker rs1800588 and SBP slope was found (p-value = 0.001). This marker is previously liked to HDL , but there has not been any study that linked the marker with SBP. It is important to note that markers rs2820037 and rs2398162, with smallest p-values from a genotypic test (for SBP) in the Wellcome Trust Case-Control Study , did not show any association in our data.
In general, a small amount of variation for all the quantitative traits is attributed to the genetic covariates, where the largest explained variation (3.9%) was for the intercept of HDL (Table 2). The variation in the latent variables explained by the combined model with both the environmental and genetic factors is shown in Table 2. It can be seen that 37.1% of the variation in the slope of HDL is explained by the model; however, a significant amount (86 out of the total 136.93, p-value < 0.0001) is left unexplained. Further analysis with more environmental and genetic factors is needed to explain this variation. Moreover, the slope and/or intercept of one or more of the phenotypes could be included as a covariate in the analysis to account for a possible casual dependence between the phenotypes. We plan to consider these analyses in future studies. Here we only investigated the cross-phenotype relationships via correlations. Model estimated correlations for the latent variables are given in Figure 1 using curved, double-arrow lines. Table 3 shows the correlation (along with p-values) explained by the environmental and genetic covariates. The residual correlations (data not shown) show that a significant percent of the correlations are not explained by the model, indicating that there are other common factors affecting these phenotypes simultaneously.
Our results show that a significant amount of the variations in the intercepts of the traits are explained by environmental and demographic factors. Moreover, the results identified markers that have been previously associated with the traits. We also found a novel association between marker rs1800588 and SBP. In general, however, only a small percent of the variations in the traits were attributed to the genetic factors.
In our LGC modelling, we considered unrelated individuals (with kinship coefficient = 0) from the offspring cohort of the FHS data. However, one might be interested to know how the intercepts and slopes vary not only at the individual level but also at the family level. Therefore, it is important to use models that take the correlation among family members into account. This will also allow us to explain some of the residual variances and correlations. One can use two approaches in dealing with this challenge 1) adjust for the dependency when the familial correlation is considered as a nuisance parameter and standard errors and goodness-of-fit statistics are estimated using the sandwich estimator or 2) use a two-level LGC model that allows modelling not only average change in the values of the phenotypes over time but also allows us to assess how the these changes vary between individuals in the same family and between families. We plan to address these issues in subsequent studies.
Body mass index
Coronary heart disease
Framingham Heart Study
Genetic Analysis Workshop 16
Latent growth curve
Systolic blood pressure
Pinnaduwage D, Beyene J, Fallah S: Genome-wide linkage analysis of systolic blood pressure slope using the Genetic Analysis Workshop 13 data. BMC Genetics. 2003, 4 (suppl 1): S86-10.1186/1471-2156-4-S1-S86.
Kathiresan S, Melander O, Guiducci C, Surti A, Burtt NP, Rieder MJ, Cooper GM, Roos C, Voight BF, Havulinna AS, Wahlstrand B, Hedner T, Corella D, Tai ES, Ordovas JM, Berglund G, Vartiainen E, Jousilahti P, Hedblad B, Taskinen MR, Newton-Cheh C, Salomaa V, Peltonen L, Groop L, Altshuler DM, Orho-Melander M: Six new loci associated with blood low-density lipoprotein cholesterol high-density lipoprotein cholesterol, or triglycerides in humans. Nat Genet. 2008, 40: 189-197. 10.1038/ng.75.
Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, Clarke R, Heath SC, Timpson NJ, Najjar SS, Stringham HM, Strait J, Duren WL, Maschio A, Busonero F, Mulas A, Albai G, Swift AJ, Morken MA, Narisu N, Bennett D, Parish S, Shen H, Galan P, Meneton P, Hercberg S, Zelenika D, Chen WM, Li Y, Scott LJ, Scheet PA, Sundvall J, Watanabe RM, Nagaraja R, Ebrahim S, Lawlor DA, Ben-Shlomo Y, Davey-Smith G, Shuldiner AR, Collins R, Bergman RN, Uda M, Tuomilehto J, Cao A, Collins FS, Lakatta E, Lathrop GM, Boehnke M, Schlessinger D, Mohlke KL, Abecasis GR: Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet. 2008, 40: 161-169. 10.1038/ng.76.
Wellcome Trust Case Control Consortium: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007, 447: 661-678. 10.1038/nature05911.
Hox J, Stoel RD: Multilevel and SEM approaches to growth curve modeling. Encyclopedia of Statistics in Behavioral Science. Edited by: Everitt BS, Howell D. 2005, New York, Wiley, 1296-1305.
Muthén LK, Muthén BO: Mplus Statistical Analysis with Latent Variables. User's Guide. 2007, Los Angeles, Muthén and Muthén, [http://www.statmodel.com]5
This work was partially supported by grants from the Natural Sciences and Engineering Research Council of Canada (NSERC), the Mathematics of Information Technology and Complex Systems (MITACS), Canadian Institute of Health Research (CIHR) (grant number 84392), and Genome Canada through the Ontario Genomics Institute. The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences.
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/3?issue=S7.
The authors declare that they have no competing interests.
JSH contributed to the conception and design of the study, carried out the phenotype modeling, and drafted the manuscript. NMR performed marker selection and helped in drafting the manuscript. ADP participated in drafting the manuscript and helped in the biological interpretation of the results. JB contributed to the conception and design of the study, participated in the phenotype modeling and drafting of the manuscript. All authors read and approved the final manuscript.
About this article
Cite this article
Hamid, J.S., Roslin, N.M., Paterson, A.D. et al. Using a latent growth curve model for an integrative assessment of the effects of genetic and environmental factors on multiple phenotypes. BMC Proc 3, S44 (2009). https://doi.org/10.1186/1753-6561-3-S7-S44
- Coronary Heart Disease
- Systolic Blood Pressure
- Framingham Heart Study
- Kinship Coefficient
- Latent Growth Curve