- Open Access
Direct and indirect genetic effects on triglycerides through omics and correlated phenotypes
© The Author(s). 2018
- Published: 17 September 2018
Even though there has been great success in identifying lipid-associated single-nucleotide polymorphisms (SNPs), the mechanisms through which the SNPs act on each trait are poorly understood. The emergence of large, complex biological data sets in well-characterized cohort studies offers an opportunity to investigate the genetic effects on trait variability as a way of informing the causal genes and biochemical pathways that are involved in lipoprotein metabolism. However, methods for simultaneously analyzing multiple omics, environmental exposures, and longitudinally measured, correlated phenotypes are lacking. The purpose of our study was to demonstrate the utility of the structural equation modeling (SEM) approach to inform our understanding of the pathways by which genetic variants lead to disease risk. With the SEM method, we examine multiple pathways directly and indirectly through previously identified triglyceride (TG)-associated SNPs, methylation, and high-density lipoprotein (HDL), including sex, age, and smoking behavior, while adding in biologically plausible direct and indirect pathways. We observed significant SNP effects (P < 0.05 and directionally consistent) on TGs at visit 4 (TG4) for five loci, including rs645040 (DOCK7), rs964184 (ZPR1/ZNF259), rs4765127 (ZNF664), rs1121980 (FTO), and rs10401969 (SUGP1). Across these loci, we identify three with strong evidence of an indirect genetic effect on TG4 through HDL, one with evidence of pleiotropic effect on HDL and TG4, and one variant that acts on TG4 indirectly through a nearby methylation site. Such information can be used to prioritize candidate genes in regions of interest, inform mechanisms of action of methylation effects, and highlight possible genes with pleiotropic effects.
Lipid traits, such as triglyceride (TG) and high-density lipoprotein (HDL) cholesterol concentrations, are highly heritable; estimates range from 20 to > 70%, with common variants estimated to explain approximately 30 to 33% of the variance for these traits [1, 2]. Genome-wide association studies (GWAS) have identified more than 100 SNPs associated with lipid traits, many of which are shared across more than one lipid trait [1–8]. Even though there has been great success in identifying lipid-associated SNPs, the mechanisms through which these SNPs act on each trait are poorly understood. The emergence of large, complex biological data sets in well-characterized cohort studies offers an opportunity to investigate the genetic effects on trait variability as a way of informing the causal genes and biochemical pathways that are involved in lipoprotein metabolism. However, methods for simultaneously analyzing multiple omics, environmental exposures, and longitudinally measured, correlated phenotypes are lacking.
The purpose of our study was to demonstrate the utility of the structural equation modeling (SEM) approach to inform our understanding of the pathways by which genetic variants lead to disease risk. With the SEM method, we can examine multiple pathways directly and indirectly through previously identified TG-associated SNPs, methylation, and HDL, including sex, age, and smoking behavior, while adding in biologically plausible direct and indirect pathways. Although SEM has been used to examine the influence of genetic variants on disease through environmental exposures , on gene expression , and pleiotropy , to our knowledge this will be the first study to investigate pathways between GWAS-established SNPs and a disease risk factor while accounting for environmental exposures, correlated phenotypes, and epigenetic markers simultaneously using an SEM framework. Thus, using the GAW20 data, we will show the usefulness of the SEM results to inform the prioritization of candidate genes in regions of association, inform mechanisms of action of methylation effects, and inform possible genes with pleiotropic effects.
GAW20data were provided by the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study. Individual genetic and phenotypic data are drawn from a total of 1105 adults from 188 families. Of these, 810 individuals from 172 families have been genotyped on the Affymetrix 6.0 (Affymetrix, Inc., Santa Clara, CA, USA).
Our analyses focused on fasting TG as the primary outcome measure. Secondary outcomes included DNA methylation and HDL. Of those with genotype data, 707 participants had whole-genome methylation data measured from CD4+ T cells at visit 2. The HM450K array was used to measure DNA methylation (Illumina, Inc., San Diego, CA, USA) following bisulphite conversion. The platform detects methylation status of 485,577 CpG (cystine-phosphate-guanine) sites by sequencing-based genotyping of bisulphite-treated DNA. The methylation score for each CpG is reported as a β value, ranging from 0 (nonmethylated) to 1 (completely methylated), according to the intensity ratio of detected methylation. We calculated principal components (PCs) using methylation β values across all CpGs in R (v3.3.1) and used the first four PCs to adjust to control for cell purity and batch effects prior to performing association analyses. Covariates included sex, baseline age, and study center. Additionally, we adjusted for baseline smoking status, as had been done in previous genetic and methylation association analyses [12–15].
SNP and CpG selection
TG-associated SNPs and nearby CpGs included in SEM models
# of CpGs ± 10 kb
CpGs included in final modela
# Variables (Dependent/Independent)
cg03143046, cg01353538, cg20940044, cg25373579, cg23879496, cg12682870
cg06595719, cg14815609, cg05862431, cg11835342, cg14371153, cg17490921
cg19078769, cg00201185, cg10922530, cg02647265
cg16724811, cg06978461, cg03245889, cg03928410, cg26985681
cg00477287, cg01313994, cg01559787, cg08112740, cg19643441
Our sample included up to 707 participants of the GOLDN study (50% women); the average age of participants at the time of the baseline examination was 48 years. It is worth noting that all participants in the GOLDN study received treatment with fenofibrates between visits 2 and 3, which resulted in a decrease in mean TG and a decrease in the variance (mean[SD] TG1 = 106.35[106.35]; TG2 = 140.16[99.34]; TG3 = 92.26[57.41]; TG4 = 90.14[55.07]).
Parameter estimates for significant pathways from SNP, through intermediate exposure, to TG4
SNP → HDL3 → HDL4 → TG4
SNP → HDL3 → TG3 → TG4
SNP → TG1 → TG2 → TG3 → TG4
SNP → HDL1 → HDL2 → TG2 → TG3 → TG4
SNP → HDL1 → HDL2 → HDL3 → TG3 → TG4
SNP → HDL1 → HDL2 → HDL3 → HDL4 → TG4
SNP → HDL1 → TG1 → TG2 → TG3 → TG4
SNP → cg02647265 → TG2 → TG3 → TG4
SNP → HDL3 → TG3 → TG4
SNP → HDL1 → TG1 → TG2 → TG3 → TG4
SNP → HDL1 → HDL2 → HDL3 → TG3 → TG4
SNP → HDL1 → HDL2 → HDL3 → HDL4 → TG4
For rs964184 (ZPR1/ZNF259), which was previously associated with both HDL and TG , we identified direct effects on both HDL1 and TG1 through which the SNP displays a significant (P < 0.05 and directionally consistent) effect on TG4 (Fig. 2b). Also, the effect of this locus on TG across all visits remains significant after Bonferroni correction (P value < 0.004, 0.05/12 loci examined), as does the total effect of the SNP on TG4 accounting for all proposed pathways. Contrary to the three loci mentioned above, we do find evidence of an independent direct association for rs964184 with both TG and HDL. This suggests a model whereby a common single mechanism affects multiple lipoprotein concentrations, which is indicative of true pleiotropy. Indeed, the complexity of lipoprotein metabolism means that genes and biochemical pathways can be involved in metabolism of several lipoprotein classes .
Finally, we also observed a nominally significant (P value < 0.05) and directionally consistent indirect effect for rs4765127 (ZNF664) on TG4through a nearby CpG (cg02647265), which lies 4 kb upstream of our tag SNP in the 5′UTR (untranslated region) of CCDC92, the gene adjacent to ZNF664. While our tag SNP lies within ZNF664, variants within CCDC92 also have been associated with multiple lipid levels and lipoprotein size, also suggesting CCDC92 as a candidate gene for this region [22, 23]. Even though this variant was associated with both HDL and TG in previous GWAS , we found no evidence of a direct association of this SNP on either phenotype (Fig. 2c). Although we do observe an effect of the SNP mediated through a nearby CpG on TG, we did not explicitly model an association with the CpG on HDL. Because of the proximity of cg02647265 to the 5′ end of two genes (CCDC92and ZNF664), further investigation into the association of this CpG with expression is warranted to further elucidate the causal gene underlying this GWAS association signal.
We aimed to highlight the utility and flexibility of SEM for adding to our understanding of the genetic underpinnings of disease risk by investigating pathways between GWAS-established SNPs and a disease risk factor, TG, through correlated phenotypes and epigenetic markers simultaneously. There is substantial interest in the field for approaches to integrate multiple types of phenotypic and omics data so that a better understanding of disease mechanisms can be achieved. Using the GAW data, we were able to determine if the previously observed SNP effects on TG could be explained by an indirect effect of the SNP through HDL and nearby CpG sites. We identified three loci where associations with TG were indirect through HDL and one locus where the effects of SNPs on TG were mediated through methylation. Such information can be used to prioritize candidate genes in regions of association, inform mechanisms of action of methylation effects, and highlight possible genes with pleiotropic effects.
Although the examples highlighted herein demonstrate the utility and flexibility of SEM to inform mechanistic underpinnings of GWAS loci, our study is limited by the small set of variables available to investigate in the complex SEM. For example, only direct genotypes, methylation, HDL, and TG values were available. Lastly, it is also possible for nominally significant associations between CpGs and TG to be mediated through HDL, but because of model complexity and limited sample size, we did not test this explicitly.
The majority of investigations into the genetic underpinnings of TG do not take advantage of the wealth of longitudinal data available in many large cohort studies. Additionally, there is a dearth of comprehensive studies that incorporate genetic, epigenetic, and correlated phenotypic data to investigate pathways through which genetic variants influence trait variance. To address this important research gap, we capitalized on extant genotypic data at known TG-associated loci, longitudinal assessments of TG and HDL, smoking exposure, and methylation available through the GAW20 to explore the utility and flexibility of the SEM framework for informing mechanistic insights at GWAS loci. In future investigations, the proposed approach can be easily extended to accommodate additional and longitudinal omics data, ultimately assisting researchers in better identifying the mechanist pathways through which genetic variants influence trait variance.
We would like to thank the GAW20 workshop organizers and participants, as well as the participants of the GOLDN study.
Publication of this article was supported by NIH R01 GM031575. AEJ is funded on NIH 5K99HL130580–02, and LFR by the NIH T32-HD007168.
Availability of data and materials
The data that support the findings of this study are available from the Genetic Analysis Workshop (GAW) but restrictions apply to the availability of these data, which were used under license for the current study. Qualified researchers may request these data directly from GAW.
About this supplement
This article has been published as part of BMC Proceedings Volume 12 Supplement 9, 2018: Genetic Analysis Workshop 20: envisioning the future of statistical genetics by exploring methods for epigenetic and pharmacogenomic data. The full contents of the supplement are available online at https://bmcproc.biomedcentral.com/articles/supplements/volume-12-supplement-9.
AEJ and AGH lead the analysis. AEJ and LFR were presenters at GAW. AEJ conceived of the study, organized and designed the manuscript. All authors refined the study design, revised the manuscript critically, and read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Aulchenko YS, Ripatti S, Lindqvist I, Boomsma D, Heid IM, Pramstaller PP, Penninx BW, Janssens AC, Wilson JF, Spector T, et al. Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts. Nat Genet. 2009;41(1):47–55.View ArticleGoogle Scholar
- Willer CJ, Schmidt EM, Sengupta S, Peloso GM, Gustafsson S, Kanoni S, Ganna A, Chen J, Buchkovich ML, Mora S, et al. Global lipids genetics consortium: discovery and refinement of loci associated with lipid levels. Nat Genet. 2013;45(11):1274–83.View ArticleGoogle Scholar
- Comuzzie AG, Cole SA, Laston SL, Voruganti VS, Haack K, Gibbs RA, Butte NF. Novel genetic loci identified for the pathophysiology of childhood obesity in the Hispanic population. PLoS One. 2012;7(12):e51954.View ArticleGoogle Scholar
- Coram MA, Duan Q, Hoffmann TJ, Thornton T, Knowles JW, Johnson NA, Ochs-Balcom HM, Donlon TA, Martin LW, Eaton CB, et al. Genome-wide characterization of shared and distinct genetic components that influence blood lipid levels in ethnically diverse human populations. Am J Hum Genet. 2013;92(6):904–16.View ArticleGoogle Scholar
- Ko A, Cantor RM, Weissglas-Volkov D, Nikkola E, Reddy PM, Sinsheimer JS, Pasaniuc B, Brown R, Alvarez M, Rodriguez A, et al. Amerindian-specific regions under positive selection harbour new lipid variants in Latinos. Nat Commun. 2014;5:3983.View ArticleGoogle Scholar
- Weissglas-Volkov D, Aguilar-Salinas CA, Nikkola E, Deere KA, Cruz-Bautista I, Arellano-Campos O, Muñoz-Hernandez LL, Gomez-Munguia L, Ordoñez-Sánchez ML, Reddy PM, et al. Genomic study in Mexicans identifies a new locus for triglycerides and refines European lipid loci. J Med Genet. 2013;50(5):298–308.View ArticleGoogle Scholar
- Below JE, Parra EJ, Gamazon ER, Torres JM, Krithika S, Candille S, Lu Y, Manichakul A, Peralta-Romero J, Duan Q, et al. Meta-analysis of lipid-traits in Hispanics identifies novel loci, population-specific effects, and tissue-specific enrichment of eQTLs. Sci Rep. 2016;6:19429.View ArticleGoogle Scholar
- Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, Koseki M, Pirruccello JP, Ripatti S, Chasman DI, Willer CJ, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature. 2010;466(7307):707–13.View ArticleGoogle Scholar
- Mi X, Eskridge KM, George V, Wang D. Structural equation modeling of gene-environment interactions in coronary heart disease. Ann Hum Genet. 2011;75(2):255–65.PubMedGoogle Scholar
- Xiong M. Structural equation models for pathway identification. Nat Genet. 2001;27:96.View ArticleGoogle Scholar
- Li R, Tsaih SW, Shockley K, Stylianou IM, Wergedal J, Paigen B, Churchill GA. Structural model analysis of multiple quantitative traits. PLoS Genet. 2006;2(7):e114.View ArticleGoogle Scholar
- Cole CB, Nikpay M, McPherson R. Gene-environment interaction in dyslipidemia. Curr Opin Lipidol. 2015;26(2):133–8.View ArticleGoogle Scholar
- Dumitrescu L, Carty CL, Taylor K, Schumacher FR, Hindorff LA, Ambite JL, Anderson G, Best LG, Brown-Gentry K, Bůžková P, et al. Genetic determinants of lipid traits in diverse populations from the population architecture using genomics and epidemiology (PAGE) study. PLoS Genet. 2011;7(6):e1002138.View ArticleGoogle Scholar
- Li X, Monda KL, Goring HH, Haack K, Cole SA, Diego VP, Almasy L, Laston S, Howard BV, Shara NM, et al. Genome-wide linkage scan for plasma high density lipoprotein cholesterol, apolipoprotein A-1 and triglyceride variation among American Indian populations: the strong heart family study. J Med Genet. 2009;46(7):472–9.View ArticleGoogle Scholar
- Braun KVE, Dhana K, de Vries PS, Voortman T, van Meurs JBJ, Uitterlinden AG, et al. Epigenome-wide association study (EWAS) on lipids: the Rotterdam study. Clin Epigenetics. 2017;9:15.View ArticleGoogle Scholar
- Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, Clarke R, Heath SC, Timpson NJ, Najjar SS, Stringham HM, et al. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet. 2008;40(2):161–9.View ArticleGoogle Scholar
- White H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980;48(4):817–38.View ArticleGoogle Scholar
- Lt H, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Model. 1999;6(1):1–55.View ArticleGoogle Scholar
- Tucker L, Lewis C. A reliability coefficient for maximum likelihood factor analysis. Psychometrika. 1973;38(1):1–10.View ArticleGoogle Scholar
- Bentler PM. Comparative fit indexes in structural models. Psychol Bull. 1990;107(2):238–46.View ArticleGoogle Scholar
- Johansen CT, Kathiresan S, Hegele RA. Genetic determinants of plasma triglycerides. J Lipid Res. 2011;52(2):189–206.View ArticleGoogle Scholar
- Latsuzbaia A, Jaddoe VW, Hofman A, Franco OH, Felix JF. Associations of genetic variants for adult lipid levels with lipid levels in children. The generation R study. J Lipid Res. 2016;57(12):2185–92.View ArticleGoogle Scholar
- Chasman DI, Pare G, Mora S, Hopewell JC, Peloso G, Clarke R, Cupples LA, Hamsten A, Kathiresan S, Mälarstig A, et al. Forty-three loci associated with plasma lipoprotein size, concentration, and cholesterol content in genome-wide analysis. PLoS Genet. 2009;5(11):e1000730.View ArticleGoogle Scholar