Skip to main content

Epigenome wide association study of SNP–CpG interactions on changes in triglyceride levels after pharmaceutical intervention: a GAW20 analysis


In the search for an understanding of how genetic variation contributes to the heritability of common human disease, the potential role of epigenetic factors, such as methylation, is being explored with increasing frequency. Although standard analyses test for associations between methylation levels at individual cytosine-phosphate-guanine (CpG) sites and phenotypes of interest, some investigators have begun testing for methylation and how methylation may modulate the effects of genetic polymorphisms on phenotypes. In our analysis, we used both a genome-wide and candidate gene approach to investigate potential single-nucleotide polymorphism (SNP)–CpG interactions on changes in triglyceride levels. Although we were able to identify numerous loci of interest when using an exploratory significance threshold, we did not identify any significant interactions using a strict genome-wide significance threshold. We were also able to identify numerous loci using the candidate gene approach, in which we focused on 18 genes with prior evidence of association of triglyceride levels. In particular, we identified GALNT2 loci as containing potential CpG sites that moderate the impact of genetic polymorphisms on triglyceride levels. Further work is needed to provide clear guidance on analytic strategies for testing SNP–CpG interactions, although leveraging prior biological understanding may be needed to improve statistical power in data sets with smaller sample sizes.


Methylation plays a major role in gene regulation through epigenetic modifications at specific cytosine-phosphate-guanine (CpG) residues within the regulatory regions of genes and, consequently, may influence the transcriptional activity [1]. In brief, methylation occurs when a methyl group is transferred to the DNA via a family of DNA methyltransferases. The majority of DNA methylation occurs oncytosines, which immediately precedea guanine nucleotide (ie, CpG site). These CpG sites occur frequently throughout the genome and have been linked to both single-nucleotide polymorphisms (SNPs) and epigenetic changes [2].In particular, DNA methylation may lead to different influences on gene activities depending on the surrounding genetic sequence [3]. Because SNPs near the CpG site may alter methylation levels, the statistical interaction between SNPs and CpG sites may explain varying gene expression across individuals. Prior research shows that DNA methylation in the interleukin-4 receptor is associated with asthma, but this association is further explained by the presence or absence of a nearby SNP [4]. A study focusing on obesity found the interaction between CpG sites in an enhancer region interacts with CpG creating SNP sites in an obesity-risk haplotype, which helps explain obesity/Type 2 Diabetes [5].

As part of GAW20, we were provided access to a data set of methylation, SNPs, and triglyceride levels over 2 time periods, along with numerous related covariates. In particular, the study measured triglyceride levels before and after pharmaceutical intervention. Given the well-known relationship between triglycerides and many different cardiometabolic diseases, including cardiovascular disease [6], we chose to look for evidence of methylation at CpG sites that potentially modulate the impact of nearby SNPs on changes in triglyceride levels.


Sample population and variables

The sample consisted of 670 individuals from a pedigree sample provided as part of GAW20 for whom all analyzed variables were available. We considered 6 covariates (age, observation center, smoking status, mass spectrometry DX client [MSDX] International Diabetes Federation [IDF] score, fasting time at baseline, and high-density lipoprotein [HDL] at baseline) the majority of which were significantly associated with baseline triglyceride (TG) in this sample. The primary response variable of interest was TG level at baseline (visit 1 or 2). For variables with up to 2 measurements at baseline (HDL [baseline], TG [baseline]), we used the average value if both measurements were available, or the only available measurement if only one was available.


The modeling process was done in 2 stages. The first stage model resulted in a single residual TG value for each person, while the second stage resulted in approximately 700,000 models (one for each SNP that passed standard genome-wide association study [GWAS] quality control [QC] criteria: Hardy-Weinberg equilibrium p value> 1 × 10− 6, minor allele frequency > 1%, SNP missing data rate < 5%).

In the first stage, we used the lmekin function from the coxme package in R [7] to predict the change in log-transformed TG levels [y = ln (baseline)]. In cases where 2 separate TG measurements were available for the baseline, we natural-log (ln)-transformed the data before averaging. Baseline ln-transformed TG levels was predicted by the 6 covariates listed earlier and accounted for the familial relationships in the model through the use of the kinship matrix. We then saved the resulting “residual” value (\( {r}_i={\widehat{y}}_i-{y}_i \)) for each of the i = 1,…, 670 individuals in our analysis.

The second stage predicted the residuals (ris) from stage 1 based on the number of minor alleles (SNPj = 0, 1, 2) and methylation scores (CPGj [0, 1]) with a separate model for each SNPj, CPGj pair using the lm function in R [7]. In particular, the second stage model for SNPj, CPGj pair was:

$$ {r}_j={\beta}_{S_j}{SNP}_j+{\beta}_{C_j}{CPG}_j+{\beta}_{S_j{C}_j}{SNP}_j{CPG}_j $$

where \( {\beta}_{S_j{C}_j} \) is the estimate of interaction effect between SNPj and CPGj. SNPj, CPGj pairs were made by assigning each SNP passing QC to its nearest CpG site, resulting in approximately 700,000 pairs, with some CpG sites assigned to multiple SNPs.

Statistical analysis

Statistical significance of the interaction term in Eq. 1 was assessed using an F test, essentially testing whether the statistical interaction provided significantly more evidence of association with changes in TG levels versus a model with only main-effects terms. Versions of Eq. 1 without the interaction term were also run. We started by using a generally accepted, but stricter and conservative, genome-wide significance level of 5 × 10− 8. We followed up this analysis by using a more liberal and exploratory significance level of 1 × 10− 4 in our genome-wide interaction analysis.

We followed this genome-wide analysis with a candidate gene study focusing on 18 gene regions (containing 423 unique SNP-CpG sites) that have been shown to be associated with TGs in previous genome-wide association studies via searches at Throughout the candidate gene analysis, we used a significance level of 0.05. As part of the candidate gene analysis we also collapsed all the CpG sites within each gene region (50 kb on either side of the gene) by using 5 different methods (mean, minimum, maximum, median, and sum-squares of the CpG values as the CPG value in the model) to evaluate the potential impact of different ways of summarizing methylation evidence for each SNP. For the SNPs that demonstrated a significant interaction for more than one of the collapsing methods used, we then looked at the interactions between all CpG sites within the region and those SNPs.


Genome-wide approach

No interaction term p values were significant when using the conservative 5 × 10−8threshold. However, 58 SNP-CpG pairs showed significant interactions using the more liberal 1 × 10−4significance level. Table 1 summarizes 25 loci that include regions of SNPs that are colocalized and within genes (total of 44 interactions). The median p value of the interaction term across all sites was 0.504 and a lambda value of 1.02, showing no inflation of test statistics.

Table 1 Summary of 25 loci with significant interactions between SNP and CpG site at the 1 × 10−4significance levela

Candidate gene approach

In our data, there are 18 genes (containing 423 SNPs for which data was available) previously shown to be associated with TG levels. Table 2 summarizes the results of fitting Eq. 1 with an interaction term, as well as a version of Eq. 1 without the interaction term.

Table 2 Summary of 18 genes with previous evidence of association with triglyceride levels

Thirteen of the 18 candidate genes show at least modest (p < 0.05) evidence of statistical interaction between nearby methylation values and SNPs within the gene. The most significant SNP is in FADS3 (rs1675102) and has a minor allele frequency of 0.28. The interaction is such that additional copies of the minor allele lead to a decreased impact of methylation on changes in TG levels.

Table 3 shows the results of collapsing all the CpG sites within each gene region through the minimum method, which uses the minimum CpG value of all CpG sites within 50 kb of the gene. Compared to the other 4 methods, the minimum method resulted in more significant interactions (44) than did the other 4 collapsing methods, which on average only have 23 significant interactions (detailed results not shown).

Table 3 Summary of CpG results after collapsing using the minimum method

We identified 176 unique SNPs in significant interactions for more than 1 of the 5 different CpG collapsing methods as found in Table 4. In total, there are 176 unique significant SNP × CpG interactions. GALNT2 had the largest number of significant results with 69 interactions, where 1 of the 69 interactions is the most significant with a p value of 0.000142. The SNP in this interaction (rs6677241) has a minor allele frequency of 0.026. The interaction results in an increased impact of methylation on TG levels for every additional allele.

Table 4 Summary of 176a interaction pairs


Although no significant SNP–CPG interactions were identified when using strict, genome-wide significance thresholds (5 × 10− 8), use of a more exploratory approach identified many genes previously shown to be associated with cardiometabolic traits (1 × 10− 4). A candidate gene approach, using a significance level of 0.05, identified loci in 13 genes with modest evidence for SNP-CpG interactions on baseline TG levels. Furthermore, by using the collapsing methods, we were able to identify potentially interesting SNPs for additional exploration. Using only these SNPs, our examination of all CpG sites within each gene region resulted in 176 significant unique SNP-CpG pairs. In every case, the SNP-CpGp value was smaller than both the SNP and CpGp values from the noninteraction model. This suggests that using SNP-CpG pairs may identify SNPs that would not be identified by traditional GWAS techniques. The gene GALNT2, had the most significant interactions with 69. SNPs in GALNT2 were previously identified as associated with TG levels, high- and low-density lipoprotein cholesterol [8]. One study shows that promoter methylation of GALNT2 is associated with a higher risk of coronary heart disease [9].

There are some limitations to our analysis. First, to manage computational resources, we began by predicting baseline TG levels by kinship and covariates, yielding residuals for each individual, which we used for assessing impact of methylation and genetic variation. Other alternatives to this methodology may exist. We used an exploratory significance threshold for the genome-wide analysis, relative to the vast majority of GWAS-type analyses published today. Although this can lead to more false-positive results, we did find a number of “subthreshold” loci of potential interest suggesting the need for studies with larger sample sizes and more sensitive statistical methods to draw out these loci of interest. The minimum method of summarizing methylation in a region nearby to a gene showed promise, although further work is needed to more fully evaluate the many options. Regardless, leveraging prior biological evidence (eg, via the candidate gene approach) may be of potential effect when testing for SNP–CPG interactions.


Even with “subthreshold” significance, our results go a long way toward showing the need for statistical models that leverage prior biological information. Our study shows that a mediated effect of SNPs on methylation is a possible explanation for changes in TG levels. With this knowledge, more studies with greater sample sizes can be performed as well as wet lab experimentation to confirm the relationship. As we learn more about the effect an individual’s genotype has on their health, there is greater opportunity for personalized medicine to be an effective treatment strategy.


  1. Rösl F1, Arab A, Klevenz B, zurHausen H. The effect of DNA methylation on gene regulation of human papillomaviruses. J Gen Virol. 1993;74(Pt 5):791–801.

    Article  Google Scholar 

  2. Zhi D, Aslibekyan S, Irvin MR, Claas SA, Borecki IB, Ordovas JM, Absher DM, Arnett DK. SNPs located at CpG sites modulate genome-epigenome interaction. Epigenetics. 2013;8(8):802–6.

    CAS  Article  Google Scholar 

  3. Moore LD, Le T, Fan G. DNA methylation and its basic function. Neuropsychopharmacology. 2013;38(1):23–38.

    CAS  Article  Google Scholar 

  4. Soto-Ramírez N, Arshad SH, Holloway JW, Zhang H, Schauberger E, Ewart S, Patil V, Karmaus W. The interaction of genetic variants and DNA methylation of the interleukin-4 receptor gene increase the risk of asthma at age 18 years. Clin Epigenetics. 2013;5(1):1.

    Article  Google Scholar 

  5. Bell CG, Finer S, Lindgren CM, Wilson GA, Rakyan VK, Teschendorff AE, Akan P, Stupka E, Down TA, Prokopenko I, et al. Integrated genetic and epigenetic analysis identifies haplotype-specific methylation in the FTO type 2 diabetes and obesity susceptibility locus. PLoS One. 2010;5(11):e14040.

    Article  Google Scholar 

  6. Lindman AS, Veierød MB, Tverdal A, Pedersen JI, Selmer R. Nonfasting triglycerides and risk of cardiovascular death in men and women from the Norwegian counties study. Eur J Epidemiol. 2010;25(11):789–98.

    CAS  Article  Google Scholar 

  7. R-Project: “R,” 2016. [Online]. Accessed Feb 2017.

  8. Guo T, Yin RX, Huang F, Yao LM, Lin WX, Pan SL. Association between the DOCK7, PCSK9 and GALNT2 gene polymorphisms and serum lipid levels. Sci Rep. 2016;6:19079.

    CAS  Article  Google Scholar 

  9. Peng P, Wang L, Yang X, Huang X, Ba Y, Chen X, Guo J, Lian J, Zhou J. A preliminary study of the relationship between promoter methylation of the ABCG1, GALNT2 and HMGCR genes and coronary heart disease. PLoS One. 2014;9(8):e102265.

    Article  Google Scholar 

  10. Comuzzie AG, Cole SA, Laston SL, Voruganti VS, Haack K, Gibbs RA, Butte NF. Novel genetic loci identified for the pathophysiology of childhood obesity in the Hispanic population. PLoS One. 2012;7(12):e51954.

    CAS  Article  Google Scholar 

  11. Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, Powell C, Vedantam S, Buchkovich ML, Yang J, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518(7538):197–206.

    CAS  Article  Google Scholar 

  12. Williams SR, Hsu FC, Keene KL, Chen WM, Nelson S, Southerland AM, Madden EB, Coull B, Gogarten SM, Furie KL, et al. Shared genetic susceptibility of vascular-related biomarkers with ischemic and recurrent stroke. Neurology. 2016;86(4):351–9.

    CAS  Article  Google Scholar 

  13. Sung YJ, de Las Fuentes L, Schwander KL, Simino J, Rao DC. Gene-smoking interactions identify several novel blood pressure loci in the Framingham heart study. Am J Hypertens. 2015;28(3):343–54.

    CAS  Article  Google Scholar 

  14. Carty CL, Keene KL, Cheng YC, Meschia JF, Chen WM, Nalls M, Bis JC, Kittner SJ, Rich SS, Tajuddin S, et al. Meta-analysis of genome-wide association studies identifies genetic risk factors for stroke in African Americans. Stroke. 2015;46(8):2063–8.

    CAS  Article  Google Scholar 

  15. Zheng JS, Arnett DK, Lee YC, Shen J, Parnell LD, Smith CE, Richardson K, Li D, Borecki IB, Ordovás JM, et al. Genome-wide contribution of genotype by environment interaction to variation of diabetes-related traits. PLoS One. 2013;8(10):e77442.

    CAS  Article  Google Scholar 

  16. Rühle F, Witten A, Barysenka A, Huge A, Arning A, Heller C, Krümpel A, Mesters R, Franke A, Lieb W, et al. Rare genetic variants in SMAP1, B3GAT2, and RIMS1 contribute to pediatric venous thromboembolism. Blood. 2017;129(6):783–90.

    Article  Google Scholar 

  17. Yu B, Zheng Y, Alexander D, Manolio TA, Alonso A, Nettleton JA, Boerwinkle E. Genome-wide association study of a heart failure related metabolomic profile among African Americans in the atherosclerosis risk in communities (ARIC) study. Genet Epidemiol. 2013;37(8):840–5.

    Article  Google Scholar 

  18. Smith NL, Felix JF, Morrison AC, Demissie S, Glazer NL, Loehr LR, Cupples LA, Dehghan A, Lumley T, Rosamond WD, et al. Association of genome-wide variation with the risk of incident heart failure in adults of European and African ancestry: a prospective meta-analysis from the cohorts for heart and aging research in genomic epidemiology (CHARGE) consortium. Circ Cardiovasc Genet. 2010;3(3):256–66.

    CAS  Article  Google Scholar 

  19. Christophersen IE, Rienstra M, Roselli C, Yin X, Geelhoed B, Barnard J, Lin H, Arking DE, Smith AV, Albert CM, et al. Large-scale analyses of common and rare variants identify 12 new loci associated with atrial fibrillation. Nat Genet. 2017;49(6):946–52.

    CAS  Article  Google Scholar 

  20. Nagy R, Boutin TS, Marten J, Huffman JE, Kerr SM, Campbell A, Evenden L, Gibson J, Amador C, Howard DM, et al. Exploration of haplotype research consortium imputation for genome-wide association studies in 20,032 generation Scotland participants. Genome Med. 2017;9(1):23.

    Article  Google Scholar 

  21. Wang KS, Liu X, Zheng S, Zeng M, Pan Y, Callahan K. A novel locus for body mass index on 5p15.2: a meta-analysis of two genome-wide association studies. Gene. 2012;500(1):80–4.

    CAS  Article  Google Scholar 

Download references


Publication of this article was supported by NIH R01 GM031575.

Availability of data and materials

The data that support the findings of this study are available from the Genetic Analysis Workshop (GAW), but restrictions apply to the availability of these data, which were used under license for the current study. Qualified researchers may request these data directly from GAW.

About this supplement

This article has been published as part of BMC Proceedings Volume 12 Supplement 9, 2018: Genetic Analysis Workshop 20: envisioning the future of statistical genetics by exploring methods for epigenetic and pharmacogenomic data. The full contents of the supplement are available online at

Author information

Authors and Affiliations



All authors participated in the conception of this idea, have read and approved of the final manuscript. JVW wrote most of the code. JV and AK analyzed the data. JV wrote the manuscript and made revisions. NLT provided support throughout the project.

Corresponding author

Correspondence to Nathan L. Tintle.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Veenstra, J., Kalsbeek, A., Koster, K. et al. Epigenome wide association study of SNP–CpG interactions on changes in triglyceride levels after pharmaceutical intervention: a GAW20 analysis. BMC Proc 12 (Suppl 9), 58 (2018).

Download citation

  • Published:

  • DOI:


  • Epigenome-wide Association Studies
  • Single Nucleotide Polymorphisms (SNPs)
  • Candidate Gene Analysis
  • Genome-wide Interaction Analysis
  • Sensitive Statistical Method