Discovering pure gene-environment interactions in blood pressure genome-wide association studies data: a two-step approach incorporating new statistics
© Wang et al.; licensee BioMed Central Ltd. 2014
Published: 17 June 2014
Environment has long been known to play an important part in disease etiology. However, not many genome-wide association studies take environmental factors into consideration. There is also a need for new methods to identify the gene-environment interactions. In this study, we propose a 2-step approach incorporating an influence measure that capturespure gene-environment effect. We found that pure gene-age interaction has a stronger association than considering the genetic effect alone for systolic blood pressure, measured by counting the number of single-nucleotide polymorphisms (SNPs)reaching a certain significance level. We analyzed the subjects by dividing them into two age groups and found no overlap in the top identified SNPs between them. This suggested that age might have a nonlinear effect on genetic association. Furthermore, the scores of the top SNPs for the two age subgroups were about 3times those obtained when using all subjects for systolic blood pressure. In addition, the scores of the older age subgroup were much higher than those for the younger group. The results suggest that genetic effects are stronger in older age and that genetic association studies should take environmental effects into consideration, especially age.
Gene-environment interactions (G × E) have long been known to play an important role in complex disease etiology. Understanding these will reduce the bias in variable selection because of different cohort exposure to theenvironment . Previous methods of studying G × E effects have mainly included candidate genes, case-only design, and family-based association studies [1, 2]. These methods have made their respective assumptions in terms of biological knowledge, independence of gene and environment, and kinship information. There is an urgent need for new methods to detect gene-environment effects. With the emergence of genome-wide association studies (GWAS), data mining methods, such as generalized linear models incorporating G × E terms, are becoming popular [3, 4]. We do not know,however, how much of the association identified is a result of main effects and how much is a result ofpure G × E interactions. In this study, we used a 2-step method that, first, aggressively removed main effects from both gene and environment, and then tested for the strength of pure G × E interaction. We found that, for systolic blood pressure (SBP), the pure gene-age interaction was stronger than the main effect of single-nucleotide polymorphisms (SNPs) alone. We also analyzed the genetic association separately in two age groups to test the effect of age. We found that the marker profiles were quite distinct in different age cohorts. This suggested that age might have a strong nonlinear effect on genetic association.
The dataset adopted in this study was provided by Genetic Analysis Workshop 18 (GAW18), for which real phenotypes and genotypes from the San Antonio Family Studies are used. We focused on chromosome 3, which includes 62,915 SNPs. There were 142 unrelated individuals, for whom information is available on SBP, diastolic blood pressure (DBP), age, smoking status, and use of antihypertension medication. Although there were 4longitudinal measurements of the phenotypes, we considered only the first measurement, which had the fewest missing values. After removing the missing values, the data for 130 unrelated individuals were retained for further analysis.
Detecting pure G × E effects
Step 1: Removal of main effect of gene and environment
where is the mth smooth function of any linear combinations of x. Because PPR does not assume linear relation of the predictors, both nonlinear and linear effects can be removed from the residual. It is calculated using R package ppr in a stepwise manner by first removing the main effect of the environment, and then the main effect of the gene, without considering the interactions among them.
Step 2: Evaluation of interactions by an influence measure
An influence measure wasintroduced by Lo and Zheng  to capture the interaction effects based on partitions by a variable subset. It has been shown to be very effective in capturing joint effects, even when main effects are weak. Important SNPs were found for inflammatory bowel disease and confirmed by later experimental results . This also worked in a classification algorithm that achieved the lowest error rates in predicting several cancer datasets .
where i runs through the partition cells, n i is the number of observations in cell i, n is the total number of observations, is the local mean of phenotypes in cell i, and is the overall mean. When the partition contains no association information, cell mean should be very close to the overall mean . By contrast, when a subset of variables has a joint influence on Y, the difference between and will be large. The effect will be captured by the squared deviation and weighted by n i 2 , resulting an elevated I-score. The proposed method complements main effect methods. So one can find main effect first by using another method, and then add the interaction features back.
The significance of the I-score is evaluated by permutation on the phenotype of the data set 107 times.
Dichotomization of age
Smoking and medication are both discrete variables. We dichotomize age by a 2-mean clustering method (k-means in R). The cutoff value was found to be 55. Thus, if age is >55 years, the age is coded as 1, otherwise it is coded 0.
Nonlinear gene-age association
Current GWAS assume that a biomarker affects disease, independent of age, so most SNPs identified in the literature are those with strong association across the whole age range. What if some genetic effect is nonlinear with age: In one's youth a group of SNPs influences the phenotype,whereas in old age some other group of SNPs takes effect? To test this hypothesis, we divided the individuals into two groups by the same 2-mean clustering threshold, at age 55 years. There were 76 subjects age ≤55 years (the younger group) and 54 subjects age >55 years (the older group). We selected the top SNPs (G effect) within each group by I-score and checked to what extentthese SNPs overlapped.
Detecting pure gene-environment effects
Pure gene-age association is stronger than SNP main effect for SBP
The number of SNPs reaching three levels of significance (via permutation)
Significance level reached
G × age*
Nonlinear gene-age association
Analysis for SBP
Nonlinear age effect on genetic association for SBP
a. Age ≤55 years (76 observations)
b. Age >55 years (54 observations)
There was no overlap between the top SNPs from the two age groups (the first overlap occurred at the 202nd and 92nd SNP in the two groups, respectively).
The I-scores of the top SNPs in age subgroups were about 3times as great as the overall I-scores calculated disregarding age (using all subjects) (see Table 2). We know that under the null hypothesis, when no association exists for a marker subset, the expected I-score is 1. The result suggests that, in this dataset, most genetic SNPs did not affect blood pressure uniformly across all age ranges. The number 1 marker rs16851260, which has an I-score of 90.44 identified by pure SNP-age interaction using 130 subjects, only ranked 6th in the subgroup of age >55 years but had a much higher I-score of 142.42. This means that this marker has a stronger genetic association in the older age group and, if calculating it using the general population, would dilute this marker's association effect.
Moreover, for SBP, the average I-score in the older age group is much higher than in the younger subgroup. For example, using the top 10 markers, the difference is 2.2 times. The result suggests that genetic association for SBP is much stronger in the older age group than in the youngerage group.
Analysis for DBP
Nonlinear age effect on genetic association for DBP
a. Age ≤55 years (76 observations)
b. Age >55 years (54 observations)
Considering SBP and DBP separately in GWAS
Many epidemiology studies have indicated different physiology and trend of development for SBP and DBP. It has been reported that systolic pressure is related to the elasticity of the great vessels and diastolic pressure to peripheral resistance resulting from muscle stiffness . Consistent with this knowledge, the important SNPs identified for SBP and DBP in our study had few overlaps, either marginally or interactively. The results suggested that it might be better to study the two component blood pressures separatelywhen analyzing hypertension.
Considering age group separately in GWAS
In addition to finding that pure SNP-ageinteraction was stronger than the main genetic effect, we also found, by showing that the top identified genetic SNPs were completely different between age groups,that genetic effect on blood pressure was nonlinear with respect to age.
If the number of true markers is assumed to be 100, this probability is much more significant at 10−5.
Also, the strength of genetic association was much stronger in the older group than in the younger, especially for SBP. The results suggest that age has a nonlinear impact on genetic association and that the nonlinear effect of age should be considered in GWAS, perhaps by conducting studies in separate age groups. Becausethis study has a limited sample size, further research on larger numbers of subjects should be conducted.
Pure G × E interaction-identified SNPs
The pure G × age interaction identified for SBP with pvalue reaching 10−6is rs6446285 on gene BSN. The gene is involved in the organization of the cytomatrix at the nerve terminal's active zone that regulates neurotransmitter release, and is involved in the formation of retinal photoreceptor ribbon synapses . The 4SNP-age interactions reaching 10−6 for DBP are all from gene PBRM1. Mutations at this locationhave been associated with renal cell carcinoma .
This study demonstrated the strong G × E interactions for blood pressure. Even when main effect has been removed, pure G × E effect could be stronger than using main effect alone for SBP. The study also preliminarily explored the nonlinear age effect on genetic association and confirmed the hypothesis that some SNPs had a strong influence in a particular age range, and that the genetic effect might not be uniform across a person's lifespan. The results suggest that past GWAS might have captured only a small group of very influential SNPs that are effective regardless of age or other environmental factors. There might be a lot more SNPs, such as those shown in this study, that are "turned on" only in a particular age range and remain to be identified. These SNPs might fill in the missing heritability in the picture of GWAS.
MHW's research was partially supported by the Chinese University of Hong Kong Direct Grant 2041755. IH's research was partially supported by Hong Kong Research Grants Council grant 601312 and grants from Hong Kong University of Science and Technology PRC11BM03, FSGRF12BM04, and SBI12BM05. MHW would like to thank Li KaShing Institute of Health Sciences for providing the computing facility and technical support to perform this study.
The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.
This article has been published as part of BMC Proceedings Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.
- Thomas D: Gene-environment-wide association studies: emerging approaches. Nat Rev Genet. 2010, 11: 259-272. 10.1038/nrg2764.PubMed CentralView ArticlePubMedGoogle Scholar
- Hunter DJ: Gene-environment interactions in human diseases. Nat Rev Genet. 2005, 6: 287-298.View ArticlePubMedGoogle Scholar
- Kraft P: Exploiting gene-environment interaction in genome-wide association scans. Ann Hum Genet. 2007, 71: 557-558.Google Scholar
- Murcray CE, Lewinger JP, Gauderman WJ: Gene-environment interaction in genome-wide association studies. Am J Epidemiol. 2009, 169 (2): 219-226.PubMed CentralView ArticlePubMedGoogle Scholar
- Friedman JH, Stuetzle W: Projection pursuit regression. J Am Stat Assoc. 1981, 817-823. 76Google Scholar
- Chernoff H, Lo SH, Zheng TA: Discovering influential variables: a method of partitions. Ann Appl Stat. 2009, 3 (4): 1335-1369. 10.1214/09-AOAS265.View ArticleGoogle Scholar
- Lo SH, Zheng T: A demonstration and findings of a statistical approach through reanalysis of inflammatory bowel disease data. Proc Natl Acad Sci USA. 2004, 101 (28): 10386-10391. 10.1073/pnas.0403662101.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang HT, Lo SH, Zheng T, Hu IC: Interaction-based feature selection and classification for high-dimensional biological data. Bioinformatics. 2012, 28: 2834-2842. 10.1093/bioinformatics/bts531.PubMed CentralView ArticlePubMedGoogle Scholar
- Kannel WB, Gordon T, Schwartz MJ: Systolic versus diastolic blood pressure and risk of coronary heart disease-Framingham Study. Am J Cardiol. 1971, 27: 335-337. 10.1016/0002-9149(71)90428-0.View ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B Met. 1995, 57 (1): 289-300. 10.2307/2346101.Google Scholar
- NCBI Gene Database. [http://www.ncbi.nlm.nih.gov/gene]
- PBRM1 gene cards. [http://www.genecards.org/cgi-bin/carddisp.pl?gene=PBRM1]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.