Direct and indirect genetic effects on triglycerides through omics and correlated phenotypes

Even though there has been great success in identifying lipid-associated single-nucleotide polymorphisms (SNPs), the mechanisms through which the SNPs act on each trait are poorly understood. The emergence of large, complex biological data sets in well-characterized cohort studies offers an opportunity to investigate the genetic effects on trait variability as a way of informing the causal genes and biochemical pathways that are involved in lipoprotein metabolism. However, methods for simultaneously analyzing multiple omics, environmental exposures, and longitudinally measured, correlated phenotypes are lacking. The purpose of our study was to demonstrate the utility of the structural equation modeling (SEM) approach to inform our understanding of the pathways by which genetic variants lead to disease risk. With the SEM method, we examine multiple pathways directly and indirectly through previously identified triglyceride (TG)-associated SNPs, methylation, and high-density lipoprotein (HDL), including sex, age, and smoking behavior, while adding in biologically plausible direct and indirect pathways. We observed significant SNP effects (P < 0.05 and directionally consistent) on TGs at visit 4 (TG4) for five loci, including rs645040 (DOCK7), rs964184 (ZPR1/ZNF259), rs4765127 (ZNF664), rs1121980 (FTO), and rs10401969 (SUGP1). Across these loci, we identify three with strong evidence of an indirect genetic effect on TG4 through HDL, one with evidence of pleiotropic effect on HDL and TG4, and one variant that acts on TG4 indirectly through a nearby methylation site. Such information can be used to prioritize candidate genes in regions of interest, inform mechanisms of action of methylation effects, and highlight possible genes with pleiotropic effects.


Background
Lipid traits, such as triglyceride (TG) and high-density lipoprotein (HDL) cholesterol concentrations, are highly heritable; estimates range from 20 to > 70%, with common variants estimated to explain approximately 30 to 33% of the variance for these traits [1,2]. Genome-wide association studies (GWAS) have identified more than 100 SNPs associated with lipid traits, many of which are shared across more than one lipid trait [1][2][3][4][5][6][7][8]. Even though there has been great success in identifying lipid-associated SNPs, the mechanisms through which these SNPs act on each trait are poorly understood. The emergence of large, complex biological data sets in well-characterized cohort studies offers an opportunity to investigate the genetic effects on trait variability as a way of informing the causal genes and biochemical pathways that are involved in lipoprotein metabolism. However, methods for simultaneously analyzing multiple omics, environmental exposures, and longitudinally measured, correlated phenotypes are lacking.
The purpose of our study was to demonstrate the utility of the structural equation modeling (SEM) approach to inform our understanding of the pathways by which genetic variants lead to disease risk. With the SEM method, we can examine multiple pathways directly and indirectly through previously identified TG-associated SNPs, methylation, and HDL, including sex, age, and smoking behavior, while adding in biologically plausible direct and indirect pathways. Although SEM has been used to examine the influence of genetic variants on disease through environmental exposures [9], on gene expression [10], and pleiotropy [11], to our knowledge this will be the first study to investigate pathways between GWAS-established SNPs and a disease risk factor while accounting for environmental exposures, correlated phenotypes, and epigenetic markers simultaneously using an SEM framework. Thus, using the GAW20 data, we will show the usefulness of the SEM results to inform the prioritization of candidate genes in regions of association, inform mechanisms of action of methylation effects, and inform possible genes with pleiotropic effects.

Study sample
GAW20data were provided by the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study. Individual genetic and phenotypic data are drawn from a total of 1105 adults from 188 families. Of these, 810 individuals from 172 families have been genotyped on the Affymetrix 6.0 (Affymetrix, Inc., Santa Clara, CA, USA).

Phenotypes/covariates
Our analyses focused on fasting TG as the primary outcome measure. Secondary outcomes included DNA methylation and HDL. Of those with genotype data, 707 participants had whole-genome methylation data measured from CD4+ T cells at visit 2. The HM450K array was used to measure DNA methylation (Illumina, Inc., San Diego, CA, USA) following bisulphite conversion. The platform detects methylation status of 485,577 CpG (cystine-phosphate-guanine) sites by sequencing-based genotyping of bisulphite-treated DNA. The methylation score for each CpG is reported as a β value, ranging from 0 (nonmethylated) to 1 (completely methylated), according to the intensity ratio of detected methylation. We calculated principal components (PCs) using methylation β values across all CpGs in R (v3.3.1) and used the first four PCs to adjust to control for cell purity and batch effects prior to performing association analyses. Covariates included sex, baseline age, and study center. Additionally, we adjusted for baseline smoking status, as had been done in previous genetic and methylation association analyses [12][13][14][15].

SNP and CpG selection
We selected established TG-associated SNPs [2,16] available on the Affymetrix 6.0, array (Table 1) to evaluate direct and indirect SNP effects on TG in our SEM framework. To evaluate indirect effects through methylation, we searched for CpG sites near our tag variant (±10 kb) for inclusion; however, as the focus of this  [17]. Root mean square error of approximation (RMSEA), comparative fit index (CFI), and Tucker-Lewis index (TLI) were examined to ensure appropriate model fit. As a consequence of the small sample size compared to model complexity we considered RMSEA values ≥0.10 and CFI or TLI ≤0.9 as an indicator of poor model fit [18][19][20]. For each locus, if our full model did not meet these fit criteria, nonsignificant CpGs (P > 0.2) were removed from the model and reevaluated for model fit. For all loci, model fit criteria were met following this step. Figure 1 illustrates the full models and shows all segments along pathways. Mplus Version 7.4 was used for all SEM; maximum likelihood estimation was used.
Even though our SEM framework simultaneously estimated the effect of SNP across each segment of the modeled pathways, we report only direct and indirect effects on TG4 for the SNPs that reached nominal significance (P < 0.05) and displayed directionally consistent effects with previous GWAS findings.

Results
Our sample included up to 707 participants of the GOLDN study (50% women); the average age of participants at the time of the baseline examination was 48 years. It is worth noting that all participants in the GOLDN study received treatment with fenofibrates between visits 2 and 3, which resulted in a decrease in mean TG and a decrease in the variance ( We observed significant indirect SNP effects (P < 0.05 and directionally consistent) on TG4 for five loci (Table 2), including rs645040 (DOCK7), rs964184 (ZPR1/ZNF259), rs4765127 (ZNF664), rs1121980 (FTO), andrs10401969 (SUGP1). We did not identify any significant direct effects of any SNP on TG4 after accounting for indirect effects through early measured TG, HDL or CpG methylation. For both rs645040 (Fig. 2a) and rs1121980, we observe a significant direct effect on HDL at visit three (HDL3) through which the SNP has a significant (P value < 0.05 and directionally consistent) indirect effect on TG4. Both variants were associated with TG in previous GWAS; rs1121980 was also associated with HDL [2]. Although rs645040 was not previously associated with HDL, rs483465, which lies only approximately 120 kb upstream of rs645040 (R 2 = 0.904), has been associated with HDL. For both of these loci, the effect previously observed on TG and HDL does not appear to be a result of pleiotropy, Fig. 1 Diagram illustrating the full SEM model but rather an indirect effect of the SNP on TG through its effects on HDL. Similarly, rs10401969, which displays a significant indirect effect through HDL at visit 1 (HDL1) on TG4, does not display a significant independent direct effect on TG. Even though this locus has not been associated with HDL, it has been associated with TG, low-density lipoprotein (LDL) cholesterol, and total cholesterol (TC).
For rs964184 (ZPR1/ZNF259), which was previously associated with both HDL and TG [2], we identified direct effects on both HDL1 and TG1 through which the SNP displays a significant (P < 0.05 and directionally consistent) effect on TG4 (Fig. 2b). Also, the effect of this locus on TG across all visits remains significant after Bonferroni correction (P value < 0.004, 0.05/12 loci examined), as does the total effect of the SNP on TG4 accounting for all proposed pathways. Contrary to the three loci mentioned above, we do find evidence of an independent direct association for rs964184 with both TG and HDL. This suggests a model whereby a common single mechanism affects multiple lipoprotein concentrations, which is indicative of true pleiotropy. Indeed, the complexity of lipoprotein metabolism means that genes and biochemical pathways can be involved in metabolism of several lipoprotein classes [21].
Finally, we also observed a nominally significant (P value < 0.05) and directionally consistent indirect effect for rs4765127 (ZNF664) on TG4through a nearby CpG (cg02647265), which lies 4 kb upstream of our tag SNP in the 5′UTR (untranslated region) of CCDC92, the gene adjacent to ZNF664. While our tag SNP lies within ZNF664, variants within CCDC92 also have been associated with multiple lipid levels and lipoprotein size, also suggesting CCDC92 as a candidate gene for this region [22,23]. Even though this variant was associated with both HDL and TG in previous GWAS [2], we found no evidence of a direct association of this SNP on either phenotype (Fig. 2c). Although we do observe an effect of the SNP mediated through a nearby CpG on TG, we did not explicitly model an association with the CpG on HDL. Because of the proximity of cg02647265 to the 5′ end of two genes (CCDC92and ZNF664), further investigation into the association of this CpG with expression is warranted to further elucidate the causal gene underlying this GWAS association signal.

Discussion
We aimed to highlight the utility and flexibility of SEM for adding to our understanding of the genetic underpinnings of disease risk by investigating pathways between GWAS-established SNPs and a disease risk factor, TG, through correlated phenotypes and epigenetic markers simultaneously. There is substantial interest in the field for approaches to integrate multiple types of phenotypic and omics data so that a better understanding of disease mechanisms can be achieved. Using the GAW data, we were able to determine if the previously observed SNP effects on TG could be explained by an indirect effect of the SNP through HDL and nearby CpG sites. We identified three loci where associations with TG were indirect through HDL and one We highlight all significant pathways with a focus only on significant SNP effects (P value < 0.05 and directionally consistent). Bonferroni significant pathways (P value < 0.004) are bolded locus where the effects of SNPs on TG were mediated through methylation. Such information can be used to prioritize candidate genes in regions of association, inform mechanisms of action of methylation effects, and highlight possible genes with pleiotropic effects.
Although the examples highlighted herein demonstrate the utility and flexibility of SEM to inform mechanistic underpinnings of GWAS loci, our study is limited by the small set of variables available to investigate in the complex SEM. For example, only direct genotypes, methylation, HDL, and TG values were available. Lastly, it is also possible for nominally significant associations between CpGs and TG to be mediated through HDL, but because of model complexity and limited sample size, we did not test this explicitly.

Conclusions
The majority of investigations into the genetic underpinnings of TG do not take advantage of the wealth of longitudinal data available in many large cohort studies. Additionally, there is a dearth of comprehensive studies that incorporate genetic, epigenetic, and correlated phenotypic data to investigate pathways through which genetic variants influence trait variance. To address this important research gap, we capitalized on extant genotypic data at known TG-associated loci, longitudinal assessments of TG and HDL, smoking exposure, and methylation available through the GAW20 to explore the utility and flexibility of the SEM framework for informing mechanistic insights at GWAS loci. In future investigations, the proposed approach can be easily extended to accommodate additional and longitudinal omics data, ultimately assisting researchers in better identifying the mechanist pathways through which genetic variants influence trait variance.