Bivariate genetic association analysis of systolic and diastolic blood pressure by copula models

Konigorski, Stefan; Yilmaz, Yildiz E; Bull, Shelley B

doi:10.1186/1753-6561-8-S1-S72

Volume 8 Supplement 1

Genetic Analysis Workshop 18: Human sequence data in extended pedigrees

Proceedings
Open access
Published: 17 June 2014

Bivariate genetic association analysis of systolic and diastolic blood pressure by copula models

Stefan Konigorski¹^nAff2,
Yildiz E Yilmaz^1,3^nAff4 &
Shelley B Bull^1,3

BMC Proceedings volume 8, Article number: S72 (2014) Cite this article

1917 Accesses
14 Citations
Metrics details

Abstract

We conduct genetic association analysis in the subset of unrelated individuals from the San Antonio Family Studies pedigrees, applying a two-stage approach to take account of the dependence between systolic and diastolic blood pressure (SBP and DBP). In the first stage, we adjust blood pressure for the effects of age, sex, smoking, and use of antihypertensive medication based on a novel modification of censored regression. In the second stage, we model the bivariate distribution of the adjusted SBP and DBP phenotypes by a copula function with interpretable SBP-DBP correlation parameters. This allows us to identify genetic variants associated with each of the adjusted blood pressures, as well as variants that explain the association between the two phenotypes. Within this framework, we define a pleiotropic variant as one that reduces the SBP-DBP correlation. Our results for whole genome sequence variants in the gene ULK4 on chromosome 3 suggest that inference obtained from a copula model can be more informative than findings from the SBP-specific and DBP-specific univariate models alone.

Background

A number of genome-wide association studies (GWAS) involving large populations have been conducted to identify genetic variants associated with various single blood pressure (BP) measures: systolic blood pressure (SBP), diastolic blood pressure (DBP), or a linear function of them. Although the correlation between SBP and DBP is high, the results of the GWAS for each separately indicate only partially overlapping sets of variants associated with SBP and DBP. In this report, we model SBP and DBP jointly, taking the association between them into account. Constructing a bivariate model for these two phenotypes can increase the power to detect causal variants for one or both phenotypes, shedding more light onto the complex underlying genetic processes.

We apply copula functions [1] to model the bivariate distribution of SBP and DBP conditional on genetic variants. Copulas are functions used to construct a joint distribution by combining the marginal distributions with a dependence structure, and they allow investigation of the dependence structure between the phenotypes SBP and DBP separately from the marginal distributions. This property of copula models is very useful in identifying genetic variants that explain the dependence between SBP and DBP. It is well known that the Pearson correlation coefficient effectively measures the linear dependence of two random variables coming from a bivariate normal distribution. However, it may not be a good measure for other bivariate distributions where the conditional mean of $Y_{i}$ given $Y_{j}$ is not linear in $Y_{j}$ . Hence, we prefer a nonparametric correlation measure. One frequently used measure based on concordance and discordance is Kendall's tau, which is the probability of concordance minus the probability of discordance. We also use upper and lower tail dependence measures, which measure the level of dependence in the upper-right quadrant tail and lower-left quadrant tail of a bivariate distribution, respectively, as it might be especially interesting to find pleiotropic variants explaining association between high SBP and high DBP or low SBP and low DBP.

Our analysis has two objectives: (a) to investigate the association of some common variants with SBP and DBP under the joint model of SBP and DBP, and (b) to identify pleiotropic variants, which we define as variants that explain the association between SBP and DBP.

Methods

San Antonio family studies data

The Genetic Analysis Workshop 18 (GAW18) data set includes 153 unrelated individuals from the San Antonio family studies pedigrees having SBP and DBP measurements at one or more study exams, information regarding current use of antihypertensive medication, and nongenetic covariates sex, age, and current tobacco smoking status at some examinations. To retain all subjects, we imputed missing age values by adding the mean time interval between measurements to the last known age of a subject. Some missing values of smoking status were also imputed by examining the smoking patterns of individuals over the four time points. We later verified that these imputations did not lead to any significant differences in parameter estimates and inference. Among the 153 unrelated individuals with measured phenotype data, 100 have whole genome sequence data. We conducted our genetic analysis on chromosome 3 in this group, considering only the ULK4 gene previously found to be associated with DBP [2]. We analyzed 1771 variants with minor allele frequency (MAF) ≥0.05.

Phenotype definition

Before modeling the joint distribution of SBP and DBP conditional on genetic variants, we first adjusted the observed BPs for the effect of antihypertensive medication and other nongenetic covariates. The GAW18 unrelated pedigree members include hypertensive individuals (ie, with high BP) and some taking antihypertensive medication (Table 1). Adjusting BP for the effect of BP-lowering medication is crucial when the objective is to identify genes associated with high or low BP. Based on a simulation study comparing several methods [3], the use of a censored regression model conditional on both nongenetic and genetic covariates was recommended, assuming that a treated individual's true "underlying" BP is higher than that observed. For our analysis, we extend their censored regression approach by deriving the maximum likelihood estimate (MLE) of a conditional expectation that is in the form of fitted BP plus a nonnegative adjustment term depending on the observed BP and the nongenetic covariates. It provides a more intuitive adjustment of the BP of treated individuals than, for example, the nonparametric method or assuming that treatment has the same constant effect for each individual as in Tobin et al [3].

Table 1 Number of unrelated individuals, hypertensive individuals, and individuals on antihypertensive medication

Full size table

At each examination point $j$ $(j = 1, 2, 3, 4)$ , we separately fitted censored regression models of BP conditional on nongenetic covariates age, sex, and smoking status with medication use as the censoring indicator. After conducting standard residual analysis and model selection, we specified the models as

S B P_{i, j} = γ_{0 (j)} + γ_{1 (j)} s e x_{i} + γ_{2 (j)} s m o k e_{i, j} + γ_{3 (j)} (a g e_{i, j} - \bar{a g e_{j}}) + ϵ_{i, j}

(1)

D B P_{i, j} = {γ_{0 (j)}}^{'} + {γ_{1 (j)}}^{'} s e x_{i} + {γ_{2 (j)}}^{'} s m o k e_{i, j} + {γ_{3 (j)}}^{'} | a g e_{i, j} - {\bar{a g e}}_{j} | + {ϵ_{i, j}}^{'}

(2)

where $ϵ_{i, j} ~ N (0, {σ_{S B P, j}}^{2})$ , ${ϵ_{i, j}}^{'} ~ N (0, {σ_{D B P, j}}^{2})$ , $\bar{a g e_{j}} = \frac{1}{n_{j}} \sum_{i = 1}^{n_{j}} a g e_{i, j}$ , and $i = 1, \dots, n_{j}$ $(n_{1} = 141, n_{2} = 97, n_{3} = 98, n_{4} = 37)$ . This formulation of the age covariates reflects previous findings (see, eg, Ref. [4]) that SBP increases with age whereas DBP decreases after the age of 55 to 60 years, which can be approximated here with the sample mean age. We used the "survreg" function in the "survival" package of R to fit the censored regression models.

For individuals who received antihypertensive medication, we estimate the underlying BP with the MLE of the conditional expectation of BP, given that the observed BP is lower than the true underlying BP. For illustration, under the model (1), the conditional expectation is

E [S B P_{i, j} | S B P_{i, j} > S B P_{o b s, i, j}, Z_{i, j} = z_{i, j}] = γ_{(j)} z_{i, j} + \frac{{σ_{S B P, j}}^{2} f (S B P_{o b s, i, j} | z_{i, j})}{1 - F (S B P_{o b s, i, j} | z_{i, j})}

(3)

where $z_{i, j}$ denotes the vector of nongenetic covariates with associated regression parameter $γ_{(j)} = (γ_{0 (j)}, γ_{1 (j)}, γ_{2 (j)}, γ_{3 (j)}),$ and $f, F$ are the normal probability density and cumulative distribution functions, respectively, with mean $γ_{(j)} z_{i, j}$ and variance ${σ_{S B P, j}}^{2}$ . The effects of the adjustment are evident in Figure 1, with the adjusted BP of treated individuals always higher than their observed BP. At each of the first 3 examination time points, we obtained residuals from fitting the censored regression models (1) and (2). We disregarded the last examination time because few BP measurements were available (see Table 1). An untreated individual's residual is the difference between observed and fitted BPs; a treated individual's residual is the difference between adjusted and fitted BP. Finally, we averaged the residuals (over j = 1,2,3) separately for SBP and DBP, and took these mean residuals as our adjusted phenotypes.

Bivariate copula modeling

In the second stage, we first constructed the marginal models for our adjusted phenotypes $Y_{1}$ and $Y_{2}$ given a genetic variant $X = x$ , assuming that the genetic variants $X$ are independent of the nongenetic variants $Z$ . In the marginal models

Y_{1, i} = α_{0} + α_{1} x_{i} + ϵ_{i} and Y_{2, i} = β_{0} + β_{1} x_{i} + {ϵ_{i}}^{'}

(4)

we observed no evidence against the normality assumptions for the error terms. We then used a copula function $C$ to build the bivariate distribution of $Y_{1}$ and $Y_{2}$ conditional on genetic variants by combining the 2 marginal distribution functions $F_{1} (y_{1} | X = x)$ and $F_{2} (y_{2} | X = x)$ . More specifically, we consider the bivariate distribution function

F (y_{1}, y_{2} | X = x) = C_{ψ} (F_{1} (y_{1} | X = x), F_{2} (y_{2} | X = x))

(5)

where $F_{1}$ and $F_{2}$ are the normal cumulative distribution functions with variances ${σ_{1}}^{2}$ and ${σ_{2}}^{2}$ , respectively, and $ψ$ is the vector of copula parameters. To illustrate the approach, consider the 2-parameter copula family

C_{ψ} (u_{1}, u_{2}) = {\{{[{({u_{1}}^{- φ} - 1)}^{θ} + {({u_{2}}^{- φ} - 1)}^{θ}]}^{1 / θ} + 1\}}^{- 1 / φ}

(6)

with $0 \leq u_{1}, u_{2} \leq 1,$ and the copula (or dependence) parameters $ψ = (φ, θ), φ > 0, θ \geq 1$ . To explain the association between $Y_{1}$ and $Y_{2}$ , we use Kendall's tau $(τ)$ , which is a measure of overall association based on concordance and discordance, and we use lower and upper tail dependence measures (λ_L, λ_U, respectively), which explain the amount of dependence between extreme values, and can give more insight in identifying pleiotropic variants. For the copula family in (6), these dependence measures become [1]

τ = 1 - \frac{2}{θ (φ + 2)}, λ_{L} = 2^{- 1 / θ φ}, λ_{U} = 2 - 2^{1 / θ}

(7)

We obtain MLEs of the marginal parameters $α = (α_{0}, α_{1}, σ_{1})$ , $β = (β_{0}, β_{1}, σ_{2})$ in equation (4) and the copula parameters $ψ = (φ, θ)$ in equation (6) by maximizing the likelihood function [1] with the general optimization software implemented in the nlm function in R. Variance estimates for the MLEs are obtained from the inverse of the observed information matrix.

To address aim (a) concerning the marginal association of a variant with each SBP and DBP under the bivariate model (5), we test the null hypotheses $H_{0} : α_{1} = 0$ (vs. $H_{A} : α_{1} \neq 0$ ) and $H_{0} : β_{1} = 0$ (vs. $H_{A} : β_{1} \neq 0$ ) with the large sample Wald test statistic. We expect improved inference under the bivariate model compared to inference obtained by separate analysis of SBP and DBP, which we refer to as the working independence model.

In contrast, for aim (b), which is to identify a variant that explains association between SBP and DBP, the copula model dependence parameters $φ$ and $θ$ , and dependence measures (7) are of interest. We compare estimates of Kendall's $τ λ_{L}$ and $λ_{U}$ under the full bivariate model (5) that includes the genetic variant with the corresponding estimates obtained under the bivariate model without the variant (ie, the null model with $H_{0} : α_{1} = β_{1} = 0$ ). According to the delta method, we construct a confidence interval (CI) for the dependence measures using large-sample standard errors. When the CIs for a given association measure under the null and the full model do not overlap, we conclude that the given variant is pleiotropic. Use of CIs in this way is quite conservative. We also check whether the CI for $λ_{L}$ or $λ_{U}$ under the full model includes 0. Note that the copula model (6) only becomes an independent copula $C (u_{1}, u_{2}) = u_{1} u_{2}$ when $θ = 1$ and $φ$ goes to 0. However, because it is practically impossible to identify all variants, instead of testing independence, we search for variants that reduce the magnitude of the overall dependence measures, such as Kendall's $τ$ .

Results and discussion

For model selection, we note that the Akaike information criterion (AIC) value under the copula model (6) is much lower than the AIC under a bivariate normal model, indicating that the copula model is a better fit. For example, the AIC value under the copula model (6) reported in Table 2 is 1227.6 compared to an AIC value of 1337.8 under the bivariate normal model (not shown). These AICs are comparable to those obtained when conditioning on other variants. The aim (a) results (Table 2) are thus limited to the Wald test p values of the MLE estimates of the coefficients $α_{1}$ , $β_{1}$ in (4) for testing $H_{0} : α_{1} = 0$ and $H_{0} : β_{1} = 0$ under two models: the working independence model and the bivariate copula model (6) for single-variant analysis. We observed some variants, including less common (0.05 ≤ MAF ≤0.10) and more common (MAF >0.10) variants, that are identified by both models, but the p values for testing $H_{0} : α_{1} = 0$ and $H_{0} : β_{1} = 0$ under the copula model (with minimum p values 1.7 × 10⁻⁴ and 5.1 × 10⁻⁵, respectively) are smaller than the p values under the working independence model (with minimum p values 5.5 × 10⁻³ and 7.0 × 10⁻⁴, respectively). This includes variants significantly associated with both BP phenotypes under the joint copula model, although they are not significantly associated at the 1% level with either under the univariate phenotypic models (see Table 2). Overall, the estimated genetic effect sizes are larger and the estimated standard errors are slightly smaller under the bivariate model.

Table 2 Results of testing $H_{0} : α_{1} = 0$ or $H_{0} : β_{1} = 0$ for variant at 41,984,243 base-pair position

Full size table

Table 3 displays 2 of 10 variants yielding a substantial reduction in point estimates of the upper tail dependence measure $λ_{U}$ under the bivariate model (5) conditional on the variant. Compared to the null model when conditioning on a variant in the gene ULK4, Kendall's tau and lower tail dependence do not differ markedly. We observe that without conditioning on any variant $(H_{0} : α_{1} = β_{1} = 0)$ , the 2 phenotypes are moderately correlated with a Kendall's tau estimate of 0.578, and higher lower tail dependence than upper tail dependence (Table 3). Conditioning on the variant at 41,984,243 base-pair position diminishes upper tail dependence measure $λ_{U}$ at 0.01 level of significance (99% CI for $λ_{U}$ includes 0); this variant is also associated with SBP and DBP (see Table 2). Figure 2 illustrates how it achieves a reduction in upper tail dependence. Tail dependence can also be reduced in the absence of strong marginal BP associations. For example, the variant at 41,971,559 base-pair position is only modestly associated with SBP (p value = 0.040) and DBP (p value = 0.055), but the upper tail dependence is reduced from 0.449 obtained under the null model to 0.289 with a CI that includes 0 (Table 3).

Table 3 Estimates of dependence measures $(τ, λ_{L}, λ_{u})$ under the null model $H_{0} : α_{1} = β_{1} = 0$ and under model (5) in a single-variant analysis

Full size table

Conclusions

In this report, we demonstrate how to model the bivariate distribution of SBP and DBP with copulas and conduct appropriate inference for genetic association. The proposed method is shown to be applicable by considering a single gene, and crude estimates of computation time suggest that it is feasible to process 1 million variants in less than a day, for example, by using one hundred 2.5-GHz cores. Although estimating the bivariate distribution is computationally more intensive than fitting the working independence model, given the high correlation between the phenotypes, a potential advantage is that genetic associations can be detected with higher power under a plausible joint model. Using joint copula models, we were also able to identify candidate variants explaining the upper tail dependence of SBP and DBP. We generally observed strong linkage disequilibrium between variants identified. By conducting joint analyses of multiple variants in moderate linkage disequilibrium, we achieved a much more significant reduction in upper tail dependence (data not shown), and although we observed some reduction in point estimates of lower tail dependence and Kendall's tau, the CIs still overlap with those under the null model. Calling a comparison significant when the CIs fail to overlap is a conservative approach, but it is computationally efficient. As an alternative, a nonparametric bootstrap procedure could be used to estimate the variance of the estimated difference between dependence measures under the null and full models, and to construct an approximate CI. To allow multiple testing adjustments, instead of checking whether the CI for $λ_{L}$ or $λ_{U}$ under the full model includes 0, it would be desirable to test each of the null hypotheses $H_{0} : φ = 0$ or $H_{0} : θ = 1$ , respectively, to obtain p values [5].

In principle, the extension of our approach to 3 or more quantitative traits is straightforward; however, the copula model (6) may not be ideal in this setting. It involves some restrictions on the association structure, and generally the Gaussian copula is used when there are 3 or more traits. The approach could also be extended to binary traits, but with some caution because there is no unique copula identifying the joint distribution function of discrete variables [6].

References

Joe H: Multivariate Models and Multivariate Dependence Concepts. London, Chapman & Hall. 1997
Google Scholar
Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, Chasman DI, Smith AV, Tobin MD, Verwoert GC, Hwang S, et al: Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature. 2011, 478: 103-109. 10.1038/nature10405.
Article CAS PubMed Google Scholar
Tobin MD, Sheehan NA, Scurrah KJ, Burton PR: Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure. Stat Med. 2005, 24: 2911-2935. 10.1002/sim.2165.
Article PubMed Google Scholar
Sesso HD, Stampfer MJ, Rosner B, Hennekens CH, Gaziano JM, Manson JE, Glynn RJ: Systolic and diastolic blood pressure, pulse pressure, and mean arterial pressure as predictors of cardiovascular disease risk in men. Hypertension. 2000, 36: 801-807. 10.1161/01.HYP.36.5.801.
Article CAS PubMed Google Scholar
Yilmaz YE, Lawless JF: Likelihood ratio procedures and tests of fit in parametric and semiparametric copula models with censored data. Lifetime Data Anal. 2011, 17: 386-408. 10.1007/s10985-011-9192-2.
Article PubMed Google Scholar
Genest C, Neslehova J: A primer on copulas for count data. ASTIN Bull. 2007, 37: 475-515. 10.2143/AST.37.2.2024077.
Article Google Scholar

Download references

Acknowledgements

YEY was supported by a MITACS Network Industrial Fellowship, the Syd Cooper Fund and is a CIHR Fellow in Genetic Epidemiology and Statistical Genetics with CIHR STAGE (Strategic Training for Advanced Genetic Epidemiology). This work was supported in part by grants from the MITACS Network of Centres of Excellence in Mathematical Sciences and the Natural Sciences and Engineering Research Council of Canada. The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.

This article has been published as part of BMC Proceedings Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.

Author information

Stefan Konigorski
Present address: Molecular Epidemiology Group, Max Delbrück Center for Molecular Medicine (MDC), Robert-Rössle-Straße 10, 13125, Berlin, Germany
Yildiz E Yilmaz
Present address: Department of Mathematics and Statistics, Memorial University of Newfoundland, St. John's, NL, A1C 5S7, Canada

Authors and Affiliations

Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto, ON, M5T 3M7, Canada
Stefan Konigorski, Yildiz E Yilmaz & Shelley B Bull
Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital, 60 Murray Street, Box 18, Toronto, ON, M5T 3L9, Canada
Yildiz E Yilmaz & Shelley B Bull

Authors

Stefan Konigorski
View author publications
You can also search for this author in PubMed Google Scholar
Yildiz E Yilmaz
View author publications
You can also search for this author in PubMed Google Scholar
Shelley B Bull
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yildiz E Yilmaz.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

YEY and SBB designed the overall study, SK conducted statistical analyses, and SK and YEY drafted the manuscript. All authors read and approved the final manuscript.

Stefan Konigorski, Yildiz E Yilmaz contributed equally to this work.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Konigorski, S., Yilmaz, Y.E. & Bull, S.B. Bivariate genetic association analysis of systolic and diastolic blood pressure by copula models. BMC Proc 8 (Suppl 1), S72 (2014). https://doi.org/10.1186/1753-6561-8-S1-S72

Download citation

Published: 17 June 2014
DOI: https://doi.org/10.1186/1753-6561-8-S1-S72

Genetic Analysis Workshop 18: Human sequence data in extended pedigrees

Bivariate genetic association analysis of systolic and diastolic blood pressure by copula models

Abstract

Background