Proceedings  Open  Published:
A powerful statistical method identifies novel loci associated with diastolic blood pressure triggered by nonlinear geneenvironment interaction
BMC Proceedingsvolume 8, Article number: S61 (2014)
Abstract
The genetic basis of blood pressure often involves multiple genetic factors and their interactions with environmental factors. Geneenvironment interaction is assumed to play an important role in determining individual blood pressure variability. Older people are more prone to high blood pressure than younger ones and the risk may not display a linear trend over the life span. However, which gene shows sensitivity to aging in its effect on blood pressure is not clear. In this work, we allowed the genetic effect to vary over time and propose a varyingcoefficient model to identify potential genetic players that show nonlinear response across different age stages. We detected 2 novel loci, gene MIR1263 (a microRNA coding gene) on chromosome 3 and gene UNC13B on chromosome 9, that are nonlinearly associated with diastolic blood pressure. Further experimental validation is needed to confirm this finding.
Background
The genetic basis of a complex trait often involves multiple genetic factors functioning in a coordinated manner. The extent to which our genetic blueprint expresses also depends on the interactions between genetic and environmental factors. Increasing evidence shows the importance of geneenvironment (G × E) interactions in determining the risk of a variety of diseases such as respiratory diseases [1], obesity [2], and psychiatric disorders [3]. For a review of G × E interaction, see the work of Hunter [4]. The empirical evidence underscores the importance of developing novel statistical approaches to identify major genetic players that are sensitive to environmental stimuli and to further understand how they function.
Blood pressure is a heritable trait influenced by several biological pathways sensitive to environmental stimuli. High blood pressure, or hypertension, affects more than 1 billion people worldwide. It damages an individual's body in many ways over time, leading to heart disease, stroke, kidney failure, and other health problems [5]. Age is known to be a risk factor for high blood pressure. Systolic blood pressure rises with age, whereas the diastolic blood pressure tends to fall. For people with preexisting high blood pressure, this agerelated pattern occurs even if the blood pressure is well controlled with medication [6]. The reasons why blood pressure changes with age are still poorly understood, but are a topic of intense research. Thus, age should be an important predictor when searching for genetic players responsible for hypertension. However, few studies have considered an agedependent mechanism in their analysis.
The genetic response to age in blood pressure fits in well with the classical G × E interaction framework. G × E interaction typically refers to the manner in which genotypes influence phenotypes differently in different environments [7]. From a biological point of view, G × E interaction can be better viewed as the genetic responses to environment changes or stresses [8]. Statistically, interaction is considered as a departure from additivity when fitting a linear regression model with 1 or more product terms, for example,
where Y is a quantitative trait (diastolic blood pressure in this analysis), G is the genetic variable, X is the environmental variable (age), and is the error term. This is a classical linear model for G × E interaction analysis. As can be seen, equation (1) automatically assumes a linear interaction mechanism between G and X because the coefficient for G is a linear function in X. However, the contribution of the same gene to blood pressure may be quite different at different age levels. This nonlinear penetrance can be well understood by a statistical varyingcoefficient (VC) model [9]. VC models allow the coefficients to change smoothly and nonlinearly with other variables so that one can explore the dynamic feature of a response over time with great flexibility and nice interpretability [10].
In this work, we applied VC models to detect genetic variants associated with diastolic blood pressure from the Genetic Analysis Workshop 18 (GAW18) data with 142 unrelated individuals. We allowed the contribution of genetic variants to blood pressure to vary over time via varying coefficients. We further proposed a sequence of hypothesis tests to evaluate whether the effect of a genetic variant is sensitive to aging, and if it is, is it in a linear or nonlinear fashion? Using this analysis, we identified 2 novel loci that show nonlinear effects over time to affect blood pressure.
Methods
The model
The nonlinear VC model is defined as
for given $\left(X,G\right)$ and the response $Y$ with $E\left(\u03f5X,G\right)=0$ and $Var\left(\u03f5\text{}X,G\right)=1;$${\sigma}^{2}\left(X\right)=Var\left(YX,G\right)$ is the conditional variance function. The mean function is defined as $m\left(X,G\right)=\alpha \left(X\right)+\beta \left(X\right)G,$ where $\beta \left(X\right)$ is a smoothing function in $X$. Under the VC modeling framework, the effect of a gene is allowed to vary as a function of environmental factors, either linearly or nonlinearly, captured by the model itself. Thus, the VC model has the potential to dissect the nonlinear penetrance of genetic variants. Here we also allow nonlinear function of $X$ with $Y$ modeled by $\alpha \left(X\right)$. This nonlinear term adjusts the nonlinear effect of $X$ when estimating the nonlinear effect of $\beta \left(X\right)$. If we take $\alpha \left(X\right)={\alpha}_{0}+{\alpha}_{1}X,$ equation (1) is just a special case of the VC model when $\beta \left(X\right)={\beta}_{1}+{\beta}_{2}X$.
Hypothesis testing
The following list shows all 4 mean models involved in our analysis.

Model 1: $m\left(X,G\right)=\alpha \left(X\right),$ no genetic effect at all;

Model 2: $m\left(X,G\right)=\alpha \left(X\right)+\beta G,$ linear genetic effect without interaction;

Model 3: $m\left(X,G\right)=\alpha \left(X\right)+\left({\beta}_{0}+{\beta}_{1}X\right)G,$ linear genetic effect with interaction; and

Model 4: $m\left(X,G\right)=\alpha \left(X\right)+\beta \left(X\right)G,$ nonlinear genetic effect.
We first assess whether the genetic coefficients vary with × by formulating the following hypotheses,
Rejecting the null indicates that potential geneage (G × age) interaction may exist. Otherwise, we conclude there is no G × age effect and we fit mean model 2 to test for association. Because the traditional linear interaction model given in equation (1) is a special case of the proposed VC model, we next test significance of a linear effect if the above null is rejected, by formulating the following hypotheses,
Failure to reject the null indicates that there is a linear G × age effect, so we fit mean model 3 to assess association. Otherwise, we conclude that the G × age interaction is nonlinear. We then assess the nonlinear genetic effect over age by formulating the hypotheses,
The rejection of the null indicates that the genetic effect is sensitive to age in a nonlinear fashion. The sequence of hypothesis tests stated above was suggested by Ma et al [9] for optimal power to detect association.
Model implementation
We fit the varying coefficients with a Bspline technique for both $\alpha \left(\cdot \right)$ and $\beta \left(\cdot \right)$ functions. The $X$ variable was first transformed to make it more evenly distributed on each subinterval used in the Bspline smoothing technique. The great advantages of Bspline estimation over other nonparametric techniques are simple implementation and fast computing [9]. For each singlenucleotide polymorphism (SNP), $\alpha \left(X\right)$ in mean model 1 was estimated by considering the following least square problem,
The estimated $\alpha \left(X\right)$ has the form $\widehat{\alpha}\left(x\right)={\sum}_{s=1}^{N+p+1}{\lambda}_{s}{B}_{s}\left(x\right),$ where N is the number of interior knots, p is the degree of Bsplines, and ${G}_{p}={\left\{{B}_{s}\right\}}_{s=1,2,\dots ,N+p+1}$ is the set of basis Bsplines with degree p. For selecting the number of knots N and the degree p of the Bsplines, we used the Bayesian information criterion (BIC),
where ${\hat{\tau}}^{2}=1/n{\sum}_{i=1}^{n}{\left\{{Y}_{i}\hat{m}\left({X}_{i},{G}_{i}\right)\right\}}^{2}$ Then the same number of knots ${N}_{\alpha}$ and degree ${p}_{\alpha}$ were applied to estimate function $\alpha \left(X\right)$ when fitting mean models 2 to 4.
For mean model 4, the coefficient functions $\alpha \left(x\right)$ and $\beta \left(x\right)$ were estimated by,
Thus we have $\widehat{\alpha}\left(x\right)={\sum}_{t=1}^{{N}_{\alpha}+{p}_{\alpha}+1}{\widehat{\theta}}_{t}{B}_{t}\left(x\right)$ and $\widehat{\beta}\left(x\right)={\sum}_{s=1}^{{N}_{\beta}+{p}_{\beta}+1}{\widehat{\lambda}}_{s}{B}_{s}\left(x\right),$ where ${N}_{\beta}$ and ${p}_{\beta}$ are also selected following the above BIC criterion.
The error term $\sigma \left(X\right)$ can be assumed homogeneous following a normal distribution or heterogeneous without assuming a parametric distribution. When the homogeneous assumption is made, the likelihood ratio test can be applied to assess the significance of ${H}_{0}^{3}$. Otherwise ${\sigma}^{2}\left(x\right)$ can be nonparametrically estimated using the spline approximation ${\sigma}^{2}\left(x\right)\approx {\sum}_{s=1}^{N+p+1}{v}_{s}{B}_{s}\left(x\right)$ and defining ${\hat{\sigma}}^{2}\left(x\right)={\sum}_{s=1}^{N+p+1}{\hat{v}}_{s}{B}_{s}\left(x\right)$ as the spline estimate, where ${\hat{v}}_{s}={\left({\hat{v}}_{1},{\hat{v}}_{2},\cdots \phantom{\rule{0.3em}{0ex}},{\hat{v}}_{N+p+1}\right)}^{T}$ minimizes ${\sum}_{i=1}^{n}{\left\{{\hat{\u03f5}}^{2}\left({X}_{i}\right){\sum}_{s=1}^{N+p+1}{v}_{s}{B}_{s}\left({X}_{i}\right)\right\}}^{2};$ that is, $\hat{v}=argmi{n}_{v}{\left({\hat{\u03f5}}^{2}Bv\right)}^{T}\left({\hat{\u03f5}}^{2}Bv\right),$ where ${\hat{\u03f5}}^{2}={\left({\left({Y}_{1}\hat{m}\left({X}_{1},{G}_{1}\right)\right)}^{2},\cdots \phantom{\rule{0.3em}{0ex}},{\left({Y}_{n}\hat{m}\left({X}_{n},{G}_{n}\right)\right)}^{2}\right)}^{T}$ and
Thus we have $\hat{v}={\left({B}^{T}B\right)}^{1}{B}^{T}{\hat{\u03f5}}^{2},$ and ${\left({\hat{\sigma}}^{2}\left({X}_{1}\right),\cdots \phantom{\rule{0.3em}{0ex}},{\hat{\sigma}}^{2}\left({X}_{n}\right)\right)}^{T}=B\hat{v}=B{\left({B}^{T}B\right)}^{1}{B}^{T}{\hat{\u03f5}}^{2}.$ Wild bootstrap can be applied to assess the significance of ${H}_{0}^{3}$[11].
Results
We applied the above models to the GAW18 genomewide association data. We focused our analysis on diastolic blood pressure (DBP) to identify any genetic players that can explain the variability of DBP triggered by nonlinear genetic penetrance over time. We treated DBP as the response Y and age as the X variable. The genetic variable G is coded following an additive model, that is, G = 1, 0, −1, corresponding to genotype AA, Aa, aa, respectively. In total, 142 individuals and 388,099 SNPs were left after removing SNPs with a minor allele frequency less than 0.05. These SNPs are distributed on oddnumbered chromosomes from chromosome 1 to chromosome 21.
Figure 1 shows the Manhattan plots of p values when assessing significance by fitting different models. The overall p value patterns for the 3 models are quite similar. Two known and 1 unknown gene show strong nonlinear genetic effects (indicated by small p values in columns 7 and 8 in Table 1). A strong signal was detected in chromosome 3 containing a microRNA coding gene, MIR1263, and in chromosome 9 containing the gene UNC13B. MIR1263 may play a regulatory role. The signals at gene UNC13B are quite consistent for the 3 models. This gene was reported to be related with increased risk of nephropathy in patients with type 1 diabetes. Nephropathy accounts for 40% of endstage renal disease and is associated with high cardiovascular morbidity and mortality [12].
Table 1 lists SNPs with p values less than 5 × 10^{−7}. These SNPs show strong nonlinear effects over time to affect DBP, especially for SNPs in chromosome 3 (indicated by small ${p}_{42}$ and ${p}_{43}$ in Table 1). These SNPs can be easily missed by fitting traditional linear models. For illustration purposes, Figure 2 shows the fitted mean DBP function and the genetic effects of 2 SNPs in genes MIR1263 and UNC13B. For SNP rs9863717 in gene MIR1263, DBP decreases after age 55 years for individuals carrying genotype GG, whereas it increases for individuals carrying genotype AA. Thus, for a senior person who carries genotype AA at this locus, the chance to develop hypertension is higher than for others. For SNP rs10972462 in gene UNC13B, large DBP variability among the 3 genotype groups is observed after age 50 years, and a decreasing pattern is observed roughly after age 65 years. Among the 3 genotype groups, DBP is higher in the GG group, followed by the GA and AA groups. From the prevention and therapeutic point of view, people carrying genotype GG at rs10972462 locus should pay special attention after age 50 years, and so should those carrying AA genotype at rs9863717 locus after age 65 years.
Discussion and conclusions
In this work, we proposed to model the genetic effect as a nonlinear function of age. It is clear that the classical linear model, with or without interaction, is just a special case of the VC model. However, the VC model has the flexibility to capture potential nonlinear genetic effects over time. Evidence of nonlinear genetic effects has been reported previously. For example, Laitala et al [13] reported the curvilinear genetic effect on interindividual differences in coffee consumption over age. In a study of congenital scoliosis in mice [14], the authors found that mutations in genes HES7 and MESP2 are sensitive to different degrees of hypoxia, which is responsible for a nonlinear increase in the severity and penetrance of vertebral defects. Our analysis identified 2 novel loci associated with DBP with nonlinear genetic effects. They can be missed by the traditional linear interaction model. However, because statistical significance does not necessarily imply causality, further experimental validation is needed to confirm the finding.
As shown in Ma et al [9], the VC model loses power because of high degrees of freedom in the test in cases where the genetic effect is not very complex, such as in a linear form. Thus one should assess constant or linear effects first, followed by fitting the corresponding model suggested by the results of the tests. In this analysis, we found that the coefficients are constant for most SNPs.
Note that the function $\alpha \left(X\right)$ models the overall mean of DBP over time when there is no genetic effect. When a linear structure for $\alpha \left(X\right)\left(={\alpha}_{0}+{\alpha}_{1}X\right)$ is forced, we observe inflated signals for testing ${H}_{0}:\beta \left(X\right)=0$. Thus, the incorporation of this nonlinear function can largely reduce false positives. In this analysis we coded the genetic variable G in an additive fashion, although other disease models such as dominant or recessive can also be assumed, while the optimal one can be selected based on a model selection criterion such as BIC.
References
 1.
Kleeberger SR, Peden D: Geneenvironment interactions in asthma and other respiratory diseases. Annu Rev Med. 2005, 56: 383400. 10.1146/annurev.med.56.062904.144908.
 2.
Qi L, Cho YA: Geneenvironment interaction and obesity. Nutr Rev. 2008, 66: 684694. 10.1111/j.17534887.2008.00128.x.
 3.
Caspi A, Moffitt TE: Geneenvironment interactions in psychiatry: joining forces with neuroscience. Nat Rev Neurosci. 2006, 7: 583590.
 4.
Hunter DJ: Geneenvironment interactions in human diseases. Nat Rev Genet. 2005, 6: 287298.
 5.
The International Consortium for Blood Pressure GenomeWide Association Studies: Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature. 2011, 478: 103109. 10.1038/nature10405.
 6.
Kannel WB: Blood pressure as a cardiovascular risk factor: prevention and treatment. JAMA. 1996, 275: 15711576. 10.1001/jama.1996.03530440051036.
 7.
Falconer DS: The problem of environment and selection. Am Nat. 1952, 86: 293298. 10.1086/281736.
 8.
Hoffman AA, Parsons PA: Evolutionary Genetics and Environmental Stress. 1991, New York, Oxford University Press
 9.
Ma S, Yang L, Romero R, Cui Y: Varying coefficient model for geneenvironment interaction: a nonlinear look. Bioinformatics. 2011, 27: 21192126. 10.1093/bioinformatics/btr318.
 10.
Fan J, Zhang W: Statistical methods with varying coefficient models. Stat Interface. 2008, 1: 179195. 10.4310/SII.2008.v1.n1.a15.
 11.
Härdle W, Mammen E: Comparing nonparametric versus parametric regression fits. Ann Stat. 1993, 21: 19261947. 10.1214/aos/1176349403.
 12.
EURAGEDIC Consortium: G/T Substitution in intron 1 of the UNC13B gene is associated with increased risk of nephropathy in patients with type 1 diabetes. Diabetes. 2008, 57: 28432850. 10.2337/db080073.
 13.
Laitala VS, Kaprio J, Silventoinen K: Genetics of coffee consumption and its stability. Addiction. 2008, 103: 20542061. 10.1111/j.13600443.2008.02375.x.
 14.
Sparrow DB, Chapman G, Smith AJ, Mattar MZ, Major JA, O'Reilly VC, Saga Y, Zackai EH, Dormans JP, Alman BA, et al: A mechanism for geneenvironment interaction in the etiology of congenital scoliosis. Cell. 2012, 149: 295306. 10.1016/j.cell.2012.02.054.
Acknowledgements
We greatly appreciate the GAW18 data provider and the National Institutes of Health grant that supports GAW18. We also wish to thank the anonymous reviewer for the insightful comments that greatly improved the manuscript. This work was partially supported by National Science Foundation grants DMS1209112 and IOS1237969. The GAW18 whole genome sequence data were provided by the T2DGENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.
This article has been published as part of BMC Proceedings Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
HLW conducted the analysis and drafted the manuscript; TH, CW, and PSZ participated the discussion and helped with the analysis; YHC conceived the idea and wrote the manuscript. All authors read and approved the final manuscript.
Rights and permissions
About this article
Published
DOI
Keywords
 Diastolic Blood Pressure
 Genetic Effect
 Bayesian Information Criterion
 Gene MIR1263
 Wild Bootstrap