Skip to main content

Volume 3 Supplement 7

Genetic Analysis Workshop 16

Genome-wide association analysis of cardiovascular-related quantitative traits in the Framingham Heart Study

Abstract

Multivariate linear growth curves were used to model high-density lipoprotein (HDL), low-density lipoprotein (LDL), triglycerides (TG), and systolic blood pressure (SBP) measured during four exams from 1659 independent individuals from the Framingham Heart Study. The slopes and intercepts from each of two phenotype models were tested for association with 348,053 autosomal single-nucleotide polymorphisms from the Affymetrix Gene Chip 500 k set. Three regions were associated with LDL intercept, TG slope, and SBP intercept (p < 1.44 × 10-7). We observed results consistent with previously reported associations between rs599839, on chromosome 1p13, and LDL. We note that the association is significant with LDL intercept but not slope. Markers on chromosome 17q25 were associated with TG slope, and a single-nucleotide polymorphism on chromosome 7p11 was associated with SBP intercept. Growth curve models can be used to gain more insight on the relationships between SNPs and traits than traditional association analysis when longitudinal data has been collected. The power to detect association with changes over time may be limited if the subjects are not followed over a long enough time period.

Background

Cardiovascular disease (CVD) is the leading cause of death in the United States and is a significant cause of disability. Worldwide, heart disease is on the rise, and is predicted to become the leading cause of death and disability by 2020 [1].

In order to identify common risk factors leading to CVD, several large-scale epidemiological studies have been undertaken. Most notable is the Framingham Heart Study (FHS), a prospective study which was started over 50 years ago and is still ongoing. It was designed to follow the development of CVD over time in a large group of participants who had not yet developed symptoms of CVD. The original participants underwent extensive medical testing approximately every 2 years, and more recently recruited individuals have also been followed regularly. Through the data arising from the FHS, several major risk factors for CVD have been identified, including hypertension, high blood cholesterol, smoking, obesity, and diabetes. In addition, results from the study have demonstrated significant effects of demographic factors such as age and sex.

The longitudinal design of the FHS allows for the study of how certain traits change over time. The analysis of time-dependent data can range from simple plots to complex survival or multilevel modelling. In this paper, we use a latent growth curve (LGC) model to examine the change over time in levels of systolic blood pressure (SBP), high-density lipoprotein (HDL), low-density lipoprotein (LDL), and triglycerides (TG), as well as to explore the relationship between these four traits, which are known to affect the risk of developing CVD. Association analysis was performed to identify genetic factors that are associated with mean baseline values of each trait, as well as the changes over time, using subjects from the FHS.

Methods

Details about sample recruitment can be found in Cupples et al. [2] and Splansky et al. [3]. Briefly, 5209 subjects aged 29 to 62 were recruited between 1948 and 1953 from the town of Framingham, Massachusetts (Original Cohort). Between 1971 and 1975, an additional 5124 individuals were recruited, who were the offspring of the Original Cohort and the offspring's spouses (Offspring Cohort). Finally, between 2002 and 2005, 4095 third generation individuals (children of the Offspring Cohort) were recruited (Generation 3). Data from four examinations are available for each of the Original and Offspring Cohort, while data from a single examination are available for Generation 3.

Samples used for analysis

Because individuals from the Original Cohort fasted before only one of the four exams, the lipid profiles obtained from these individuals were not ideal for longitudinal analyses. We therefore restricted our analysis to members of the Offspring Cohort, who fasted before all four exams. We selected independent members of the Offspring Cohort as follows. Starting with the original 1538 families, the Generation 3 cohort was removed, which split the pedigrees into 3379 independent sub-pedigrees. Individuals belonging to the Offspring Cohort who had phenotype and genotype data were considered for inclusion in the analysis. We selected independent individuals from this set. In sub-pedigrees in which multiple sets of individuals could be chosen, we randomly selected a set that gave the largest number of independent individuals (as determined by the kinship coefficient). This resulted in 1488 individuals. An additional 171 samples without family data were added, for a total of 1659 independent individuals.

Phenotypic modeling

A linear LGC model was fit to longitudinal measurements of SBP, HDL, LDL, and TG. One of the strengths of LGC modeling is that it allows the study of multiple outcomes over time in a multivariate framework, which is particularly useful in investigating the change in phenotype values and assessing cross-phenotype relationships [4]. Two models were analyzed, corresponding to two sets of covariates. In the first set, sex, baseline age, and body mass index (BMI), and a variable to indicate a diagnosis of diabetes at any time during the study were included as time-invariant covariates, and the number of cigarettes smoked per day was considered to be a time-varying covariate. The second covariate set was identical, except that BMI was allowed to vary over time. In both models, individuals who reported taking medication at the time of examination had their relevant trait values adjusted by the addition of a constant. This procedure has been shown to provide good power to detect a genetic effect and to have little bias in effect size estimation, while being relatively robust to the exact value of the constant [5]. For individuals taking medication for hypertension, SBP was increased by 10 mm Hg [6]. For individuals taking lipid-lowering medication, the reported HDL values were adjusted by -2.15 mg/dL, LDL by +43.23 mg/dL, and TG by +24.92 mg/dL, based on previously reported average effects of statins and fibric acid derivatives in primarily White subjects [7]. Because information on the type of medication was not provided for our sample, the effects of the two types of drugs were combined using a weighted average. Missing values were imputed using the missing-at-random assumption. The models were fit using the software Mplus version 5 [8]. Because the distribution of TG was skewed, the robust maximum-likelihood method was used. LDL was calculated for each observation using the Friedewald equation [9], or was considered to be missing if TG>400 mg/dL.

Genotypes

Genotyping was conducted using the Affymetrix GeneChip Human Mapping 500 k Array Set, using the 250 k Sty and 250 k Nsp platforms. Only autosomal markers were considered for analysis. The original set of markers consisted of 487,014 autosomal markers with known chromosomal assignments and physical positions. We removed 32,594 (6.7%) markers where the call rate was <95%, based on all 6848 genotyped individuals. Markers were also removed if the minor allele frequency was <5% (101,422 markers) or the p-value from an exact test for Hardy-Weinberg equilibrium [10] was <10-6 (4945 markers). Thus, the final marker set consisted of 348,053 single-nucleotide polymorphisms (SNPs).

Association analysis

Individual-specific intercepts and slopes were obtained from the growth curve models, for each of the phenotypes SBP, HDL, LDL, and TG. These phenotypic summaries were examined for association with each marker separately using linear regression, as implemented in the program PLINK v1.04 [11], assuming an additive genetic model. Markers associated with slope traits affect the change over time of a particular phenotype, while markers associated with intercept traits affect the initial value.

Results and discussion

Description of phenotypes and covariates

The sample was composed of 1659 independent subjects who were examined at four different time points. Details about the raw variables and covariates are described in Table 1, before and after adjustment for medication use. The amount of missing phenotypic data varied between exams. Exam 1 was the most complete, with a missing rate of 0.7% for the variables used in this study. Exam 3 was the least complete, with a missing rate of 13.9%, mostly due to 208 individuals who did not attend this examination. Exams 5 and 7 had missing rates of 6.3% and 5.0%, respectively. Summaries of the slopes and intercepts obtained after fitting the growth curve models to HDL, LDL, TG, and SBP are shown in Table 2. With the exception of TG, the trait values did not change much over the four time points, as indicated by the small slopes. The time-invariant covariates age, sex and diabetes were significant at the 5% level for the intercept of all traits, but not necessarily the slopes (data not shown). Comparing the model in which BMI was treated as a time-invariant covariate with the model in which BMI was allowed to vary over time, the distributions of the estimates were shifted in location for all traits, but the variances were largely unchanged.

Table 1 Sample characteristicsa
Table 2 Characteristics of the intercepts and slopes from the growth curve modelsa

Association analysis

We tested 348,053 markers for association with the intercepts and slopes of HDL, LDL, TG, and SBP. A total of six markers in three regions were associated in at least one of the two models, using a Bonferroni cutoff of p < 1.44 × 10-7 (Table 3). The most significant association was between LDL intercept and rs599839 on chromosome 1p13 (p = 3.04 × 10-10 in the model where BMI was time-varying). The minor G allele was associated with a reduction in LDL intercept values. This marker is approximately 250 kb upstream of PSRC1, which encodes a proline-rich protein. This marker was recently shown to be associated with LDL measured at a single examination in a meta-analysis [12], and was also associated with coronary artery disease [13]. In these two studies, the major A allele was associated with an increase in LDL levels, or risk of coronary artery disease, consistent with the direction we report here. Although this SNP was associated with LDL intercept, it was not associated with LDL slope (p = 0.14), indicating that it may not play a role in the change of LDL levels over time, at least in the age range of individuals studied here. This SNP also showed the most significant association with LDL intercept in the model in which BMI was time-invariant (p = 2.62 × 10-9).

Table 3 Association results for markers with p < 1.44 × 10-7

SNP rs6501683 on 17q25 was associated with TG slope (p = 3.89 × 10-8 and 2.98 × 10-9 for BMI time-invariant and BMI time-varying models, respectively). Three other markers in the same region also show association, although at lower levels of significance. All four markers are in nearly complete linkage disequilibrium (r2 = 0.99 or 1 for all pairs of markers) in this data. TG intercept was not associated with these markers (p = 0.19 and 0.20 for BMI time-invariant and BMI time-varying models, respectively, for rs6501683).

On chromosome 7p11, rs11976165 was associated with SBP intercept (p = 6.88 × 10-8) in the model in which BMI was treated as a time-invariant covariate. In the model in which BMI was treated as a time-varying covariate, the p-value for this SNP falls just above the significance cutoff used here (p = 1.45 × 10-7). This region is peri-centromeric, and is not near any known genes. The observed association was only with the intercept (p = 0.20 for SBP slope, BMI time-invariant model).

Because there may be genes in common to the change in BMI, lipid levels, and SBP, two models were fit differing only in the treatment of the covariate BMI. One model was fit using baseline BMI as a covariate, and another in which BMI was included as a time-varying covariate. The model in which BMI was time-invariant may be better able to detect association with regions affecting both BMI and the traits of interest. However, in the dataset used here, the association results for the two models were very similar.

In order to account for the effect of trait-altering medications, we adjusted medicated values by adding a constant to the relevant observations. Adjusting for medication use by including it as a time-varying covariate in the model did not change the association patterns between the traits and markers, although a slight shrinkage bias was observed in the effect size estimates (data not shown). Because only a small proportion of individuals reported taking trait-altering medication, particularly in Exams 1, 3, and 5, we did not expect the choice of medication adjustment method to have a strong effect on the overall results.

Because the distribution of TG was non-normal, we used robust maximum-likelihood estimation [14], as implemented in the Mplus statistical software [8]. The distributions of the resulting intercepts and slopes were less skewed, but still showed a long upper tail, which could affect inference of the association tests. Additionally, the distribution of the TG intercepts from the model in which BMI was time-varying tended to be negative. This may indicate that the departure from normality was too extreme to be modeled well by robust techniques, or that a linear trajectory was not an appropriate assumption. A log transformation could normalize the distribution, at the cost of decreased interpretability of the estimates.

We did not explicitly model time, as measured by age at exam, in the growth model. This implicitly assumes that the exams were evenly spaced, an assumption which seems reasonable, on average, between Exams 3, 5, and 7, but not between Exams 1 and 3. The model could be improved by including a more appropriate measure of time.

The estimates of the slopes tended to be small for all traits, with the exception of TG. Thus, in this sample, the trait values tended to remain stable over time, and perhaps it is not surprising that significant association with a slope trait was only observed with TG. The heritability of the intercepts and slopes from this study cannot be calculated because independent samples were used. However, other analyses of the FHS data showed that the heritability of lipid levels taken at a single exam is moderate (h2 = 0.52, 0.59, and 0.48 for HDL, LDL, and TG, respectively) [15], while the heritability of SBP at a single exam is lower (h2 = 0.28) [16]. Studies investigating the heritability of the change in phenotypes over time are less common, although the heritability of the SBP slope was estimated to be similar to that of a single exam (h2 = 0.23) in the FHS data [17].

Conclusion

The longitudinal nature of the FHS data was exploited using LGC models, which allowed the study of multiple phenotypes simultaneously. Consequently, the effect of each phenotype on each other was accounted for, through pairwise correlations, in the model. An association between a marker and a particular trait, therefore, can be interpreted as an association with the trait, after the effects of the covariates and the remaining three traits have been accounted for. We used phenotypic summaries from these models to search for SNPs associated with the mean value or change over time of lipid and blood pressure phenotypes across the genome. Because long-term averages of lipid phenotypes have been shown to be heritable [16], this strategy may allow us to distinguish genes contributing to overall levels of traits from those contributing to changes over time.

Abbreviations

BMI:

Body mass index

CVD:

Cardiovascular disease

FHS:

Framingham Heart Study

HDL:

High-density lipoprotein

LDL:

Low-density lipoprotein

LGC:

Latent growth curve

SBP:

Systolic blood pressure

SNP:

Single-nucleotide polymorphism

TG:

Triglyceride.

References

  1. 1.

    Mitka M: Heart disease a global health threat. JAMA. 2004, 291: 2533-10.1001/jama.291.21.2533.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Cupples LA, Arruda HT, Benjamin EJ, D'Agostino RB, Demissie S, DeStefano AL, Dupuis J, Falls KM, Fox CS, Gottlieb DJ, Govindaraju DR, Guo CY, Heard-Costa NL, Hwang SJ, Kathiresan S, Kiel DP, Laramie JM, Larson MG, Levy D, Liu CY, Lunetta KL, Mailman MD, Manning AK, Meigs JB, Murabito JM, Newton-Cheh C, O'Connor GT, O'Donnell CJ, Pandey M, Seshadri S, Vasan RS, Wang ZY, Wilk JB, Wolf PA, Yang Q, Atwood LD: The Framingham Heart Study 100 K SNP genome-wide association study resource: overview of 17 phenotype working group reports. BMC Med Genet. 2007, 8 (suppl 1): S1-10.1186/1471-2350-8-S1-S1.

    PubMed Central  Article  PubMed  Google Scholar 

  3. 3.

    Splansky GL, Corey D, Yang Q, Atwood LD, Cupples LA, Benjamin EJ, D'Agostino RB, Fox CS, Larson MG, Murabito JM, O'Donnell CJ, Vasan RS, Wolf PA, Levy D: The Third Generation Cohort of the National Heart, Lung, and Blood Institute's Framingham Heart Study: design, recruitment, and initial examination. Am J Epidemiol. 2007, 165: 1328-1335. 10.1093/aje/kwm021.

    Article  PubMed  Google Scholar 

  4. 4.

    Duncan TE, Duncan SC, Strycker LA, Li F, Alpert A: An Introduction to Latent Variable Growth Curve Modeling: Concepts, Issues, and Applications. 1999, Mahwah, Lawrence Erlbaum

    Google Scholar 

  5. 5.

    Tobin MD, Sheehan NA, Scurrah KJ, Burton PR: Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure. Stat Med. 2005, 24: 2911-2935. 10.1002/sim.2165.

    Article  PubMed  Google Scholar 

  6. 6.

    Law MR, Wald NJ, Morris JK, Jordan RE: Value of low dose combination treatment with blood pressure lowering drugs: analysis of 354 randomised trials. BMJ. 2003, 326: 1427-1431. 10.1136/bmj.326.7404.1427.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  7. 7.

    Wu J, Province MA, Coon H, Hunt SC, Eckfeldt JH, Arnett DK, Heiss G, Lewis CE, Ellison RC, Rao DC, Rice T, Kraja AT: An investigation of the effects of lipid-lowering medications: genome-wide linkage analysis of lipids in the HyperGEN study. BMC Genet. 2007, 8: 60-10.1186/1471-2156-8-60.

    PubMed Central  Article  PubMed  Google Scholar 

  8. 8.

    Muthén LK, Muthén BO: Mplus User's Guide. 2008, Los Angeles, Muthén & Muthén

    Google Scholar 

  9. 9.

    Friedewald WT, Levy RI, Fredrickson DS: Estimation of the concentration of low-density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. Clin Chem. 1972, 18: 499-502.

    CAS  PubMed  Google Scholar 

  10. 10.

    Wigginton JE, Cutler DJ, Abecasis GR: A note on exact tests of Hardy-Weinberg equilibrium. Am J Hum Genet. 2005, 76: 887-893. 10.1086/429864.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  11. 11.

    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  12. 12.

    Kathiresan S, Melander O, Guiducci C, Surti A, Burtt NP, Rieder MJ, Cooper GM, Roos C, Voight BF, Havulinna AS, Wahlstrand B, Hedner T, Corella D, Tai ES, Ordovas JM, Berglund G, Vartiainen E, Jousilahti P, Hedblad B, Taskinen MR, Newton-Cheh C, Salomaa V, Peltonen L, Groop L, Altshuler DM, Orho-Melander M: Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Nat Genet. 2008, 40: 189-197. 10.1038/ng.75.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  13. 13.

    Samani NJ, Erdmann J, Hall AS, Hengstenberg C, Mangino M, Mayer B, Dixon RJ, Meitinger T, Braund P, Wichmann HE, Barrett JH, König IR, Stevens SE, Szymczak S, Tregouet DA, Iles MM, Pahlke F, Pollard H, Lieb W, Cambien F, Fischer M, Ouwehand W, Blankenberg S, Balmforth AJ, Baessler A, Ball SG, Strom TM, Braenne I, Gieger C, Deloukas P, Tobin MD, Ziegler A, Thompson JR, Schunkert H, WTCCC and the Cardiogenics Consortium: Genomewide association analysis of coronary artery disease. New Engl J Med. 2007, 357: 443-453. 10.1056/NEJMoa072366.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  14. 14.

    Huber PJ: The behavior of maximum likelihood estimation under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Edited by: Le Cam LM, Neyman J. 1967, Berkeley and Los Angeles: University of California Press, 1: 221-233.

    Google Scholar 

  15. 15.

    Kathiresan S, Manning AK, Demissie S, D'Agostino RB, Surti A, Guiducci C, Gianniny L, Burtt NP, Melander O, Orho-Melander M, Arnett DK, Peloso GM, Ordovas JM, Cupples LA: A genome-wide association study for blood lipid phenotypes in the Framingham Heart Study. BMC Med Genet. 2007, 8 (suppl 1): S17-10.1186/1471-2350-8-S1-S17.

    PubMed Central  Article  PubMed  Google Scholar 

  16. 16.

    Levy D, Larson MG, Benjamin EJ, Newton-Cheh C, Wang TJ, Hwang S-J, Vasan RS, Mitchell GF: Framingham Heart Study 100 K Project: genome-wide associations for blood pressure and arterial stiffness. BMC Med Genet. 2007, 8 (suppl 1): S3-10.1186/1471-2350-8-S1-S3.

    PubMed Central  Article  PubMed  Google Scholar 

  17. 17.

    Pinnaduwage D, Beyene J, Fallah S: Genome-wide linkage analysis of systolic blood pressure using the Genetic Analysis Workshop 13 data sets. BMC Genet. 2003, 4 (suppl 1): S86-10.1186/1471-2156-4-S1-S86.

    PubMed Central  Article  PubMed  Google Scholar 

Download references

Acknowledgements

This work was partially supported by grants from the Natural Sciences and Engineering Research Council of Canada (NSERC), the Mathematics of Information Technology and Complex Systems (MITACS), Canadian Institute of Health Research (CIHR, grant 84392), and Genome Canada through the Ontario Genomics Institute. We thank Mathieu Lemire for helpful statistical discussions. The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences.

This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/3?issue=S7.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Joseph Beyene.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

NMR participated in the design of the study, performed the association analysis, and drafted the manuscript. JSH participated in the design of the study, performed the growth curve modeling, and helped to draft the manuscript. ADP participated in the biological interpretation of the data and analysis results. JB participated in the design of the study. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Roslin, N.M., Hamid, J.S., Paterson, A.D. et al. Genome-wide association analysis of cardiovascular-related quantitative traits in the Framingham Heart Study. BMC Proc 3, S117 (2009). https://doi.org/10.1186/1753-6561-3-S7-S117

Download citation

Keywords

  • Body Mass Index
  • Systolic Blood Pressure
  • Framingham Heart Study
  • Growth Curve Model
  • Latent Growth Curve