Skip to content


  • Proceedings
  • Open Access

Gene-methylation epistatic analyses via the W-test identifies enriched signals of neuronal genes in patients undergoing lipid-control treatment

  • 1, 2,
  • 1, 2,
  • 1, 2,
  • 1, 2,
  • 1, 2,
  • 3,
  • 1, 2 and
  • 1, 2Email author
Contributed equally
BMC Proceedings201812 (Suppl 9) :53

  • Published:


An increasing number of studies are focused on the epigenetic regulation of DNA to affect gene expression without modifications to the DNA sequence. Methylation plays an important role in shaping disease traits; however, previous studies were mainly experiment, based, resulting in few reports that measured gene–methylation interaction effects via statistical means. In this study, we applied the data set adaptive W-test to measure gene–methylation interactions. Performance was evaluated by the ability to detect a given set of causal markers in the data set obtained from the GAW20. Results from simulation data analyses showed that the W-test was able to detect most markers. The method was also applied to chromosome 11 of the experimental data set and identified clusters of genes with neuronal and retinal functions, including MPPED2I, GUCY2E, NAV2, and ZBTB16. Genes from the TRIM family were also identified; these genes are potentially related to the regulation of triglyceride levels. Our results suggest that the W-test could be an efficient and effective method to detect gene–methylation interactions. Furthermore, the identified genes suggest an interesting relationship between lipid levels and the etiology of neurological disorders.


Genetic variants, such as single-nucleotide polymorphisms (SNPs), have been found to influence risk for human diseases. Recent studies show that epigenetics affect SNPs in genes and subsequently influence the gene function and disease trait [1]. Epigenetic mechanisms consist of DNA methylation, histone modifications, and noncoding RNAs, all of which represent the patterns of chemical and structural modifications to DNA [2]. There are an increasing number of laboratory experiments that provide evidence of DNA methylation and gene expression regulation [35]. Only a few studies, however, have evaluated the genome–epigenome interactions through statistical means, which may potentially provide novel findings for the joint effects of SNPs and cytosine-phosphate-guanine (CpG) sites [69]. The search for SNP-CpG epistasis is usually conducted through multistage or integrated analyses, where the genome and methylation data are first analyzed separately and the results then combined [10, 11]. Some studies apply existing interaction-effect methods, such as regressions, to perform the joint analysis of methylation and genome data. The advantages of the W-test method are data set adaptive probability distributions and robustness for complicated genetic architectures, such as moderate data sparsity and population stratifications [12]. By applying the W-test to gene–methylation data directly, epistasis can be measured without a preselection of biomarkers, while also relying less on significant main effects for detecting important CpG–SNP interactions. The GAW20 provided an opportunity to study methylation and genome-wide association study (GWAS) data from participants who have undergone lipid control treatment. The W-test was applied in the detection of gene–methylation interactions, resulting in interesting findings with biological implications.


GAW20 experimental and simulated data sets

GAW20 provided the study data. The study participants were patients with diabetes who had undergone lipid-control treatments with the drug fenofibrate and were recruited from the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) clinical trial project. The analyzed data sets consisted of a simulated and experimental data sets. The triglyceride (TG) levels were collected at 4 clinical visits, with 2 measurements before treatment and 2 measurements after treatment. Age, sex, smoking status, and location were recorded. Genome-wide association data were sequenced with the Affymetrix Genome-wide Human SNP array 6.0, and DNA methylation profiling was performed with the Illumina Infinium Human Methylation 450 K Bead Chip Array, using the buffy coat harvest from blood samples collected at the second and fourth visits. In the simulated data, the phenotype of the simulated data set was generated using experimental genetic data under a hypothetical model [13]. The TG levels were generated from 5 SNPs with major effects and 5 CpG sites in physical proximity. A set of 5 SNP-CpG pairs with relatively high heritability but not related to TG levels was given as noise for testing the statistical methods. The simulated data contained 680 subjects after excluding individuals with missing phenotypes. For simulated data, the 84th replicate was used as suggested by GAW20. In the experimental data set, a total of 523 participants had complete genomic and clinical measurements. Participants with missing values were removed during the quality-control process, resulting in a remaining sample size of 476. The method was applied to chromosome 11 of the experimental data.

Defining drug response

The TG levels can be used as a measure of drug response. Because common clinical standards regard a 30% decrease in TG levels as an effective control of lipids [14], we adopted the same criteria in this study. First, the average pretreatment TG levels (TG_pre) were calculated by averaging the measurements from the first and second visits. The average posttreatment TG levels (TG_post) were calculated by averaging the measurements from the third and fourth visits. Next, a percentage of change was calculated as: ΔTG% = (TG_pre–TG_post)/TG_pre. If the percentage of change was greater than 30%, then the drug treatment was defined as effective; if less than 30%, treatment was defined as ineffective. The effectiveness of the drug response was the outcome variable for both the simulated and experimental data.

The epistasis measure: The W-test

The W-test measures the probability distributional differences for a set of biomarkers between the 2 groups of participants such as the 2 drug-response groups [12]. Under an additive genetic model, a SNP variable can be coded into 3 levels with the counts of the minor alleles. The quantitative CpG variable can be divided into high and low methylation levels by two-mean clustering. A SNP-CpG pair can form a genetic combination of 6 categories. The empirical distributions are compared through a sum of the square of the log odds ratio by:
$$ W=h\sum \limits_{i=1}^k{\left[\log \frac{{\widehat{p}}_{1i}/\left(1-{\widehat{p}}_{1i}\right)}{{\widehat{p}}_{0i}/\left(1-{\widehat{p}}_{0i}\right)}/{SE}_i\right]}^2\sim {\chi}_f^2 $$

where \( {\widehat{p}}_{1i} \) and \( {\widehat{p}}_{0i} \) are the proportion of cases and controls in the ith category out of total cases or controls, respectively. SEi is the standard error of the log of odds ratios. The test statistics follows a chi-squared distribution with f degrees of freedom. Two parameters, h and f, are estimated using large-sample approximation by drawing smaller bootstrap samples under a null hypothesis. Consequently, the testing distribution is robust for complicated genetic architectures, as it adaptively adjusts to the data structure of the working data [12]. For detecting the cis-regulation patterns in the gene-methylation data, the SNPs and CpG sites located within a 10-kb genome distance on chromosome 11 were evaluated exhaustively [1].

Two types of logistic-regression models were applied as accompanying benchmarks to the W-test. The first logistic-regression model, LR-m1, considered the CpG site as a binary variable like the W-test, and the second logistic-regression model, LR-m2, included the CpG sites as a continuous variable using the original methylation values. Both logistic-regression models incorporated the main and interaction effects of SNP and CpG sites. In short, we denote:
  • LR-m1: Y = SNP + CpG + SNP × CpG, where CpG is a binary variable;

  • LR-m2: Y = SNP + CpG + SNP × CpG, where CpG is a continuous variable.

The type I error rate is an average false-positive proportion using a permuted phenotype on a pair of gene–methylation markers in 2000 replicates. A total of 140,501 epistatic pairs were tested, and a Bonferroni correction resulted in a significance level of 3.56E-7 at a family-wise error rate of 5%.


Performance of the W-test with simulated data

In the simulated data set, the W-test, LR-m1, and LR-m2 were applied to the given causal and noise pairs. Table 1 displays the p values obtained from alternative methods. Generally, the W-test gave smaller p values than LR-m1 in most answer pairs, and also had comparable p values to LR-m2. The top 3 answer pairs were all identified to be significant by the 3 methods. The W-test also found the fourth answer pair (cg00045910, rs10828412) with a p value = 0.0475, which was slightly smaller than the p values from the LR-m1 (p value = 0.0532) and LR-m2 (p value = 0.0597). The results suggested that the W-test could be sensitive to small signals with lower heritability. In terms of the performance for the noise pairs, all methods yielded noise p values greater than 0.05. The Type I error rate of the W-test was 2.95%, less than the family-wise error rate of 5%. Meanwhile, the Type I error rates of LR-m1 was 5.40% and of LR-m2 was 5.43%. The results showed that the W-test was able to distinguish between signal and noise in the simulated data set.
Table 1

p Values of 5 answers and 5 noises by the W-test and the logistic regression models LR-m1 and LR-m2 in simulated data



Marker information

p Value














4.93E − 5

1.88E − 4

2.37E − 5






6.61E − 4

2.17E − 3

3.72E − 4






7.67E − 4

2.04E − 4

8.24E − 4






4.75E − 2

5.32E − 2

5.97E − 2






3.76E − 1

6.33E − 1

4.95E − 1






5.11E − 1

1.84E − 1

1.32E − 1





6.30E − 1

6.72E − 1

4.19E − 1





1.61E − 1

2.06E − 1

1.10E − 1





4.18E − 1

1.46E − 1

5.56E − 1





7.33E − 1

8.03E − 1

4.19E − 1

Computing time

Computing time was calculated on a laptop computer with a 1.6 GHz chipset and 4 GB of random access memory using 2000 replicates on 1 pair of markers. The W-test was 4 times faster than logistic regression on a general laptop (2.28 s by the W-test, 10.12 s by the LR-m1, and 9.37 s by the LR-m2).

Identification of gene–methylation interaction in experimental data

The W-test was applied to test the gene–methylation interactions for GAW20 experimental data on chromosome 11. No significant interaction pair passed the Bonferroni correction significance level of 3.56E-07 (Table 2). We checked the functions of the top 15 identified epistatic pairs and found interesting biological implications. The top 3 SNP-CpG pairs all resided in the gene MPPED2 (11p14.1; p value = 8E-06), which encoded the protein metallophosphoesterase and was reported to be related to neuronal function [15]. Previous GWAS studies and biomedical experiments reported that MPPED2 was associated with chronic kidney disease, and knockdown of this gene in zebrafish embryos suggested a role for it in renal function [16]. GUCY2E was ranked fourth and has been reported to function in the central nervous system and retinal [17, 18]. NAV2 (ranked 6th; p value = 1.78E-05) is a neuron navigator that induces neurite outgrowth for all-trans retinoic acid, and plays an essential role in the development of the cranial nerve and the regulation of blood pressure in humans [19]. ZBTB16 at 11q23.2 (ranked 15th; p value = 7.04E-05) also has been reported as an inhibitor of neurite outgrowth in the adult central nervous system [20]. Other genes in the top 15 identified pairs include TRIM5, TRIM6-TRIM34, and TRIM3 (smallest p value = 5.22E-05), which were highly correlated with TG levels in mice [21]. The quantile–quantile (Q-Q) plot of the gene–methylation tests showed no inflation in spurious relations for the experimental data (Fig. 1).
Table 2

Top 15 gene–methylation pairs identified by the W-test in experimental dataa




Distance (kb)



p Value







7.49E − 06







7.49E − 06







8.68E − 06







1.57E − 05







1.65E − 05







1.78E − 05






2.78E − 05







3.94E − 05







4.06E − 05







4.86E − 05







5.22E − 05







5.86E − 05







6.04E − 05






6.17E − 05







7.04E − 05

aBonferroni corrected significance threshold: 3.56E − 7

Fig. 1
Fig. 1

Q-Q plot of gene–methylation interaction using experimental data

Discussion and conclusions

There has been increasing evidence for the contribution of epigenetics in regulating gene expressions implicated in diseases. Previous studies were mainly focused on experimentally studying gene–methylation interactions. In this study, we demonstrated that the W-test can be used as an effective method to identify the epistatic interactions between SNPs and CpG sites in the GAW20 simulated and experimental data sets. One common obstacle in the analysis of epistasis in the genome and epigenome is the large number of pairwise tests, the volume of which is determined by the size of the cis-regulatory region. Existing methods solve the challenge by using stage-wise and integrated analyses, in which the SNPs are separately selected and then the epistatic interactions with CpG sites are jointly evaluated in regression-based approaches [10, 11]. The stage-wise analysis may potentially miss the markers that have weak main effects but strong epistasis effects. Previous studies also made a linear assumption about the relationship between the epistatic pairs and a transformed form of the response variable, while having the advantages of covariate and population structure control. Some nonparametric methods, such as the Mann-Whitney U-test, have been applied for the analysis of methylation data [22]. However, these nonparametric tests cannot handle the potential complicated genetic architectures such as sparse data or population stratification. The W-test has the advantage of being model-free and does not assume any form of interaction effect. It also follows a chi-squared distribution in which the degrees of freedom is estimated from the working data by bootstrapped sampling. In this way, the W-test is able to correct potential bias of the probability distribution caused by complicated data structures. This method is very efficient such that it can be applied directly on SNP-CpG evaluations without prior filtering with the main effect.

Application of this method on the experimental data from patients who had undergone treatment for managing TG levels via fenofibrate identified genes that played roles in renal function, the central nervous system, and retinal functions. The enriched signals found in neuronal-related genes suggest that the blood lipid levels could be related to the neurological dysfunction in the brain, which is the most cholesterol-rich organ in the body. By performing an epistatic evaluation between SNPs and CpG sites, we identified MPPED2, GUCY2E, NAV2, and ZBTB16 as associated with hyperlipidemia. Among these 4 genes, MPPED2 was the most significant; it plays a role in neural development, and genetic variations in this gene are reported to be related to migraines, a common disease of the neural system disease [23]. Furthermore, mutations of CUCY2E are reported to be related to retinal disorders [24, 25]. ZBTB16 encodes a protein that is highly expressed in the brain, and polymorphisms in this gene are used as a marker for attention deficit hyperactivity disorder, a neuropsychiatric condition [26]. It is intriguing to note that the enriched signaling in neuronal and retinal genes are identified through epistasis evaluation between SNPs and CpG sites, but not through separate analysis of the main effect in those data sets. This shines light on the importance of integrated analysis of omics data: considering multiple facets or measurement of a common object may improve the chance of catching the underlining signal. Further studies on these threads are necessary to discover the underlying biological mechanism.




We would like to thank GAW20 for providing the epigenome data, and both reviewers for their insightful comments.


Publication of this article was supported by NIH R01 GM031575. This work is supported by the National Science Foundation of China (81473035, 31401124 to MHW).

Availability of data and materials

The data that support the findings of this study are available from the Genetic Analysis Workshop (GAW), but restrictions apply to the availability of these data, which were used under license for the current study. Qualified researchers may request these data directly from GAW.

About this supplement

This article has been published as part of BMC Proceedings Volume 12 Supplement 9, 2018: Genetic Analysis Workshop 20: envisioning the future of statistical genetics by exploring methods for epigenetic and pharmacogenomic data. The full contents of the supplement are available online at

Authors’ contributions

RS designed the study and performed the analysis. RS, HW, RM, and MHW wrote the manuscript. MHW contributed to the study design. XXX assisted in the data analysis. BZ coordinated and approved the study. All authors read and approved the final manuscript.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Authors’ Affiliations

Division of Biostatistics, Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong, Hong Kong, Special Administrative Region of China
The Chinese University of Hong Kong Shenzhen Research Institute, Shenzhen, China
Department of Anaesthesia and Intensive Care, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong, Hong Kong, Special Administrative Region of China


  1. Zhi D, Aslibekyan S, Irvin MR, Claas SA, Borecki IB, Ordovas JM, Absher DM, Arnett DK. SNPs located at CpG sites modulate genome-epigenome interaction. Epigenetics. 2013;8(8):802–6.View ArticleGoogle Scholar
  2. Voisin S, Almén MS, Zheleznyakova GY, Lundberg L, Zarei S, Castillo S, Eriksson FE, Nilsson EK, Blüher M, Böttcher Y, et al. Many obesity-associated SNPs strongly associate with DNA methylation changes at proximal promoters and enhancers. Genome Med. 2015;7(1):103.View ArticleGoogle Scholar
  3. Jaenisch R, Bird A. Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat Genet. 2003;33(Suppl):245–54.View ArticleGoogle Scholar
  4. Osborn TC, Pires JC, Birchler JA, Auger DL, Chen ZJ, Lee HS, Comai L, Madlung A, Doerge RW, Colot V, et al. Understanding mechanisms of novel gene expression in polyploids. Trends Genet. 2003;19(3):141–7.View ArticleGoogle Scholar
  5. Polansky JK, Kretschmer K, Freyer J, Floess S, Garbe A, Baron U, Olek S, Hamann A, von Boehmer H, Huehn J. DNA methylation controls Foxp3 gene expression. Eur J Immunol. 2008;38(6):1654–63.View ArticleGoogle Scholar
  6. Gutierrez-Arcelus M, Lappalainen T, Montgomery SB, Buil A, Ongen H, Yurovsky A, Bryois J, Giger T, Romano L, Planchon A, et al. Passive and active DNA methylation and the interplay with genetic variation in gene regulation. Elife. 2013;2:e00523.View ArticleGoogle Scholar
  7. Soto-Ramírez N, Arshad SH, Holloway JW, Zhang H, Schauberger E, Ewart S, Patil V, Karmaus W. The interaction of genetic variants and DNA methylation of the interleukin-4 receptor gene increase the risk of asthma at age 18 years. Clin Epigenetics. 2013;5(1):1.View ArticleGoogle Scholar
  8. Bell AF, Carter CS, Steer CD, Golding J, Davis JM, Steffen AD, Rubin LH, Lillard TS, Gregory SP, Harris JC, et al. Interaction between oxytocin receptor DNA methylation and genotype is associated with risk of postpartum depression in women without depression in pregnancy. Front Genet. 2015;6:243.View ArticleGoogle Scholar
  9. Bani-Fatemi A, Howe AS, Matmari M, Koga A, Zai C, Strauss J, De Luca V. Interaction between methylation and CpG single-nucleotide polymorphisms in the HTR2A gene: association analysis with suicide attempt in schizophrenia. Neuropsychobiology. 2016;73(1):10–5.View ArticleGoogle Scholar
  10. Hannon E, Dempster E, Viana J, Burrage J, Smith AR, Macdonald R, St Clair D, Mustard C, Breen G, Therman S, et al. An integrated genetic-epigenetic analysis of schizophrenia: evidence for co-localization of genetic associations and differential DNA methylation. Genome Biol. 2016;17(1):176.View ArticleGoogle Scholar
  11. Li R, Kim D, Dudek SM, Ritchie MD. An integrated analysis of genome-wide DNA methylation and genetic variants underlying etoposide-induced cytotoxicity in European and African populations. In: European Conference on the Applications of Evolutionary Computation. Berlin: Springer; 2014. p. 928–38.Google Scholar
  12. Wang MH, Sun R, Guo J, Weng H, Lee J, Hu I, Sham PC, Zee BC. A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Res. 2016;44(12):e115.View ArticleGoogle Scholar
  13. Irvin MR, Zhi D, Joehanes R, Mendelson M, Aslibekyan S, Claas SA, Thibeault KS, Patel N, Day K, Jones LW, et al. Epigenome-wide association study of fasting blood lipids in the genetics of lipid-lowering drugs and diet network study. Circulation. 2014;130(7):565–72.View ArticleGoogle Scholar
  14. Balfour JA, McTavish D, Heel RC. Fenofibrate. Drugs. 1990;40(2):260–90.View ArticleGoogle Scholar
  15. Pattaro C, Köttgen A, Teumer A, Garnaas M, Böger CA, Fuchsberger C, Olden M, Chen MH, Tin A, Taliun D, et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet. 2012;8:e1002584.View ArticleGoogle Scholar
  16. Witasp A, Ekström TJ, Schalling M, Lindholm B, Stenvinkel P, Nordfors L. How can genetics and epigenetics help the nephrologist improve the diagnosis and treatment of chronic kidney disease patients? Nephrol Dial Transplant. 2014;29(5):970–80.Google Scholar
  17. DuVal MG, Allison WT. Impacts of the retinal environment and photoreceptor type on functional regeneration. Neural Regen Res. 2017;12(3):376.View ArticleGoogle Scholar
  18. Boye SL, Peterson JJ, Choudhury S, Min SH, Ruan Q, McCullough KT, Zhang Z, Olshevskaya EV, Peshenko IV, Hauswirth WW, et al. Gene therapy fully restores vision to the all-cone Nrl(−/−) Gucy2e(−/−) mouse model of Leber congenital amaurosis-1. Hum Gene Ther. 2015;26(9):575–92.View ArticleGoogle Scholar
  19. Marzinke MA, Mavencamp T, Duratinsky J, Clagett-Dame M. 14-3-3ε and NAV2 interact to regulate neurite outgrowth and axon elongation. Arch Biochem Biophys. 2013;540(1):94–100.View ArticleGoogle Scholar
  20. Simpson MT, Venkatesh I, Callif BL, Thiel LK, Coley DM, Winsor KN, Wang Z, Kramer AA, Lerch JK, Blackmore MG. The tumor suppressor HHEX inhibits axon growth when prematurely expressed in developing central nervous system neurons. Mol Cell Neurosci. 2015;68:272–83.View ArticleGoogle Scholar
  21. Orozco LD, Cokus SJ, Ghazalpour A, Ingram-Drake L, Wang S, van Nas A, Che N, Araujo JA, Pellegrini M, Lusis AJ. Copy number variation influences gene expression and metabolic traits in mice. Hum Mol Genet. 2009;18(21):4118–29.View ArticleGoogle Scholar
  22. Fleischer T, Edvardsen H, Solvang HK, Daviaud C, Naume B, Børresen-Dale AL, Kristensen VN, Tost J. Integrated analysis of high-resolution DNA methylation profiles, gene expression, germline genotypes and clinical end points in breast cancer patients. Int J Cancer. 2014;134(11):2615–25.View ArticleGoogle Scholar
  23. Gormley P, Anttila V, Winsvold BS, Palta P, Esko T, Pers TH, Farh K-H, Cuenca-Leon E, Muona M, Furlotte NA. Meta-analysis of 375,000 individuals identifies 38 susceptibility loci for migraine. Nat Genet. 2016;48(8):856–66.View ArticleGoogle Scholar
  24. Semple-Rowland SL, Lee NR, Van Hooser JP, Palczewski K, Baehr W. A null mutation in the photoreceptor guanylate cyclase gene causes the retinal degeneration chicken phenotype. Proc Natl Acad Sci U S A. 1998;95(3):1271–6.View ArticleGoogle Scholar
  25. Perrault I, Rozet JM, Calvas P, Gerber S, Camuzat A, Dollfus H, Châtelin S, Souied E, Ghazi I, Leowski C. Retinal-specific guanylate cyclase gene mutations in Leber’s congenital amaurosis. Nat Genet. 1996;14(4):461–4.View ArticleGoogle Scholar
  26. Zayats T, Athanasiu L, Sonderby I, Djurovic S, Westlye LT, Tamnes CK, Fladby T, Aase H, Zeiner P, Reichborn-Kjennerud T. Genome-wide analysis of attention deficit hyperactivity disorder in Norway. PLoS One. 2015;10(4):e0122501.View ArticleGoogle Scholar


© The Author(s). 2018