Epigenome wide association study of SNP–CpG interactions on changes in triglyceride levels after pharmaceutical intervention: a GAW20 analysis

In the search for an understanding of how genetic variation contributes to the heritability of common human disease, the potential role of epigenetic factors, such as methylation, is being explored with increasing frequency. Although standard analyses test for associations between methylation levels at individual cytosine-phosphate-guanine (CpG) sites and phenotypes of interest, some investigators have begun testing for methylation and how methylation may modulate the effects of genetic polymorphisms on phenotypes. In our analysis, we used both a genome-wide and candidate gene approach to investigate potential single-nucleotide polymorphism (SNP)–CpG interactions on changes in triglyceride levels. Although we were able to identify numerous loci of interest when using an exploratory significance threshold, we did not identify any significant interactions using a strict genome-wide significance threshold. We were also able to identify numerous loci using the candidate gene approach, in which we focused on 18 genes with prior evidence of association of triglyceride levels. In particular, we identified GALNT2 loci as containing potential CpG sites that moderate the impact of genetic polymorphisms on triglyceride levels. Further work is needed to provide clear guidance on analytic strategies for testing SNP–CpG interactions, although leveraging prior biological understanding may be needed to improve statistical power in data sets with smaller sample sizes.


Background
Methylation plays a major role in gene regulation through epigenetic modifications at specific cytosine-phosphateguanine (CpG) residues within the regulatory regions of genes and, consequently, may influence the transcriptional activity [1]. In brief, methylation occurs when a methyl group is transferred to the DNA via a family of DNA methyltransferases. The majority of DNA methylation occurs oncytosines, which immediately precedea guanine nucleotide (ie, CpG site). These CpG sites occur frequently throughout the genome and have been linked to both single-nucleotide polymorphisms (SNPs) and epigenetic changes [2].In particular, DNA methylation may lead to different influences on gene activities depending on the surrounding genetic sequence [3]. Because SNPs near the CpG site may alter methylation levels, the statistical interaction between SNPs and CpG sites may explain varying gene expression across individuals. Prior research shows that DNA methylation in the interleukin-4 receptor is associated with asthma, but this association is further explained by the presence or absence of a nearby SNP [4]. A study focusing on obesity found the interaction between CpG sites in an enhancer region interacts with CpG creating SNP sites in an obesity-risk haplotype, which helps explain obesity/ Type 2 Diabetes [5].
As part of GAW20, we were provided access to a data set of methylation, SNPs, and triglyceride levels over 2 time periods, along with numerous related covariates. In particular, the study measured triglyceride levels before and after pharmaceutical intervention. Given the well-known relationship between triglycerides and many different cardiometabolic diseases, including cardiovascular disease [6], we chose to look for evidence of methylation at CpG sites that potentially modulate the impact of nearby SNPs on changes in triglyceride levels.

Sample population and variables
The sample consisted of 670 individuals from a pedigree sample provided as part of GAW20 for whom all analyzed variables were available. We considered 6 covariates (

Models
The modeling process was done in 2 stages. The first stage model resulted in a single residual TG value for each person, while the second stage resulted in approximately 700,000 models (one for each SNP that passed standard genome-wide association study [GWAS] quality control [QC] criteria: Hardy-Weinberg equilibrium p value> 1 × 10 − 6 , minor allele frequency > 1%, SNP missing data rate < 5%).
In the first stage, we used the lmekin function from the coxme package in R [7] to predict the change in log-transformed TG levels [y = ln (baseline)]. In cases where 2 separate TG measurements were available for the baseline, we natural-log (ln)-transformed the data before averaging. Baseline ln-transformed TG levels was predicted by the 6 covariates listed earlier and accounted for the familial relationships in the model through the use of the kinship matrix. We then saved the resulting "residual" value (r i ¼ŷ i −y i ) for each of the i = 1,…, 670 individuals in our analysis.
The second stage predicted the residuals (r i ′ s) from stage 1 based on the number of minor alleles (SNP j = 0, 1, 2) and methylation scores (CPG j ∈ [0, 1]) with a separate model for each SNP j , CPG j pair using the lm function in R [7]. In particular, the second stage model for SNP j , CPG j pair was: where β S j C j is the estimate of interaction effect between SNP j and CPG j . SNP j , CPG j pairs were made by assigning each SNP passing QC to its nearest CpG site, resulting in approximately 700,000 pairs, with some CpG sites assigned to multiple SNPs.

Statistical analysis
Statistical significance of the interaction term in Eq. 1 was assessed using an F test, essentially testing whether the statistical interaction provided significantly more evidence of association with changes in TG levels versus a model with only main-effects terms. Versions of Eq. 1 without the interaction term were also run. We started by using a generally accepted, but stricter and conservative, genome-wide significance level of 5 × 10 − 8 . We followed up this analysis by using a more liberal and exploratory significance level of 1 × 10 − 4 in our genome-wide interaction analysis. We followed this genome-wide analysis with a candidate gene study focusing on 18 gene regions (containing 423 unique SNP-CpG sites) that have been shown to be associated with TGs in previous genome-wide association studies via searches at http://www.ebi.ac.uk/gwas. Throughout the candidate gene analysis, we used a significance level of 0.05. As part of the candidate gene analysis we also collapsed all the CpG sites within each gene region (50 kb on either side of the gene) by using 5 different methods (mean, minimum, maximum, median, and sum-squares of the CpG values as the CPG value in the model) to evaluate the potential impact of different ways of summarizing methylation evidence for each SNP. For the SNPs that demonstrated a significant interaction for more than one of the collapsing methods used, we then looked at the interactions between all CpG sites within the region and those SNPs.

Genome-wide approach
No interaction term p values were significant when using the conservative 5 × 10 −8 threshold. However, 58 SNP-CpG pairs showed significant interactions using the more liberal 1 × 10 −4 significance level. Table 1 summarizes 25 loci that include regions of SNPs that are colocalized and within genes (total of 44 interactions). The median p value of the interaction term across all sites was 0.504 and a lambda value of 1.02, showing no inflation of test statistics.

Candidate gene approach
In our data, there are 18 genes (containing 423 SNPs for which data was available) previously shown to be associated with TG levels. Table 2 summarizes the results of fitting Eq. 1 with an interaction term, as well as a version of Eq. 1 without the interaction term. Thirteen of the 18 candidate genes show at least modest (p < 0.05) evidence of statistical interaction between nearby methylation values and SNPs within the gene. The most significant SNP is in FADS3 (rs1675102) and has a minor allele frequency of 0.28. The interaction is such that additional copies of the minor allele lead to a decreased impact of methylation on changes in TG levels. Table 3 shows the results of collapsing all the CpG sites within each gene region through the minimum method, which uses the minimum CpG value of all CpG sites within 50 kb of the gene. Compared to the other 4 methods, the minimum method resulted in more significant interactions (44) than did the other 4 collapsing methods, which on average only have 23 significant interactions (detailed results not shown).
We identified 176 unique SNPs in significant interactions for more than 1 of the 5 different CpG collapsing methods as found in Table 4. In total, there are 176 unique significant SNP × CpG interactions. GALNT2 had the largest number of significant results with 69 interactions, where 1 of the 69 interactions is the most significant with a p value of 0.000142. The SNP in this interaction (rs6677241) has a minor allele frequency of 0.026. The interaction results in an increased impact of methylation on TG levels for every additional allele.

Discussion
Although no significant SNP-CPG interactions were identified when using strict, genome-wide significance thresholds (5 × 10 − 8 ), use of a more exploratory approach identified many genes previously shown to be associated with cardiometabolic traits (1 × 10 − 4 ). A candidate gene approach, using a significance level of 0.05, identified loci in 13 genes with modest evidence for SNP-CpG interactions on baseline TG levels. Furthermore, by using the collapsing methods, we were able to identify potentially interesting SNPs for additional exploration. Using only these SNPs, our examination of all CpG sites within each gene region resulted in 176 significant unique SNP-CpG pairs. In every case, the SNP-CpGp value was smaller than both the SNP and CpGp values from the noninteraction model. This suggests that using SNP-CpG pairs may identify SNPs that would not be identified by traditional GWAS techniques. The gene GALNT2, had the most significant interactions with 69. SNPs in GALNT2 were previously identified as associated with TG levels, high-and low-density lipoprotein cholesterol [8]. One study shows that promoter methylation Duplicates are a result of the overlapping nature of several of the genes of GALNT2 is associated with a higher risk of coronary heart disease [9]. There are some limitations to our analysis. First, to manage computational resources, we began by predicting baseline TG levels by kinship and covariates, yielding residuals for each individual, which we used for assessing impact of methylation and genetic variation. Other alternatives to this methodology may exist. We used an exploratory significance threshold for the genome-wide analysis, relative to the vast majority of GWAS-type analyses published today. Although this can lead to more false-positive results, we did find a number of "subthreshold" loci of potential interest suggesting the need for studies with larger sample sizes and more sensitive statistical methods to draw out these loci of interest. The minimum method of summarizing methylation in a region nearby to a gene showed promise, although further work is needed to more fully evaluate the many  options. Regardless, leveraging prior biological evidence (eg, via the candidate gene approach) may be of potential effect when testing for SNP-CPG interactions.

Conclusions
Even with "subthreshold" significance, our results go a long way toward showing the need for statistical models that leverage prior biological information. Our study shows that a mediated effect of SNPs on methylation is a possible explanation for changes in TG levels. With this knowledge, more studies with greater sample sizes can be performed as well as wet lab experimentation to confirm the relationship. As we learn more about the effect an individual's genotype has on their health, there is greater opportunity for personalized medicine to be an effective treatment strategy.

Funding
Publication of this article was supported by NIH R01 GM031575.

Availability of data and materials
The data that support the findings of this study are available from the Genetic Analysis Workshop (GAW), but restrictions apply to the availability of these data, which were used under license for the current study. Qualified researchers may request these data directly from GAW.

About this supplement
This article has been published as part of BMC Proceedings Volume 12 Supplement 9, 2018: Genetic Analysis Workshop 20: envisioning the future of statistical genetics by exploring methods for epigenetic and pharmacogenomic data. The full contents of the supplement are available online at https:// bmcproc.biomedcentral.com/articles/supplements/volume-12-supplement-9.