Volume 8 Supplement 1

## Genetic Analysis Workshop 18: Human sequence data in extended pedigrees

# Bivariate genetic association analysis of systolic and diastolic blood pressure by copula models

- Stefan Konigorski†
^{1, 2}, - Yildiz E Yilmaz†
^{1, 3, 4}Email author and - Shelley B Bull
^{1, 3}

**8(Suppl 1)**:S72

https://doi.org/10.1186/1753-6561-8-S1-S72

© Konigorski et al.; licensee BioMed Central Ltd. 2014

**Published: **17 June 2014

## Abstract

We conduct genetic association analysis in the subset of unrelated individuals from the San Antonio Family Studies pedigrees, applying a two-stage approach to take account of the dependence between systolic and diastolic blood pressure (SBP and DBP). In the first stage, we adjust blood pressure for the effects of age, sex, smoking, and use of antihypertensive medication based on a novel modification of censored regression. In the second stage, we model the bivariate distribution of the adjusted SBP and DBP phenotypes by a copula function with interpretable SBP-DBP correlation parameters. This allows us to identify genetic variants associated with each of the adjusted blood pressures, as well as variants that explain the association between the two phenotypes. Within this framework, we define a pleiotropic variant as one that reduces the SBP-DBP correlation. Our results for whole genome sequence variants in the gene *ULK4* on chromosome 3 suggest that inference obtained from a copula model can be more informative than findings from the SBP-specific and DBP-specific univariate models alone.

## Keywords

## Background

A number of genome-wide association studies (GWAS) involving large populations have been conducted to identify genetic variants associated with various single blood pressure (BP) measures: systolic blood pressure (SBP), diastolic blood pressure (DBP), or a linear function of them. Although the correlation between SBP and DBP is high, the results of the GWAS for each separately indicate only partially overlapping sets of variants associated with SBP and DBP. In this report, we model SBP and DBP jointly, taking the association between them into account. Constructing a bivariate model for these two phenotypes can increase the power to detect causal variants for one or both phenotypes, shedding more light onto the complex underlying genetic processes.

We apply copula functions [1] to model the bivariate distribution of SBP and DBP conditional on genetic variants. Copulas are functions used to construct a joint distribution by combining the marginal distributions with a dependence structure, and they allow investigation of the dependence structure between the phenotypes SBP and DBP separately from the marginal distributions. This property of copula models is very useful in identifying genetic variants that explain the dependence between SBP and DBP. It is well known that the Pearson correlation coefficient effectively measures the linear dependence of two random variables coming from a bivariate normal distribution. However, it may not be a good measure for other bivariate distributions where the conditional mean of ${Y}_{i}$ given ${Y}_{j}$ is not linear in ${Y}_{j}$. Hence, we prefer a nonparametric correlation measure. One frequently used measure based on concordance and discordance is Kendall's tau, which is the probability of concordance minus the probability of discordance. We also use upper and lower tail dependence measures, which measure the level of dependence in the upper-right quadrant tail and lower-left quadrant tail of a bivariate distribution, respectively, as it might be especially interesting to find pleiotropic variants explaining association between high SBP and high DBP or low SBP and low DBP.

Our analysis has two objectives: (a) to investigate the association of some common variants with SBP and DBP under the joint model of SBP and DBP, and (b) to identify pleiotropic variants, which we define as variants that explain the association between SBP and DBP.

## Methods

### San Antonio family studies data

The Genetic Analysis Workshop 18 (GAW18) data set includes 153 unrelated individuals from the San Antonio family studies pedigrees having SBP and DBP measurements at one or more study exams, information regarding current use of antihypertensive medication, and nongenetic covariates sex, age, and current tobacco smoking status at some examinations. To retain all subjects, we imputed missing age values by adding the mean time interval between measurements to the last known age of a subject. Some missing values of smoking status were also imputed by examining the smoking patterns of individuals over the four time points. We later verified that these imputations did not lead to any significant differences in parameter estimates and inference. Among the 153 unrelated individuals with measured phenotype data, 100 have whole genome sequence data. We conducted our genetic analysis on chromosome 3 in this group, considering only the *ULK4* gene previously found to be associated with DBP [2]. We analyzed 1771 variants with minor allele frequency (MAF) ≥0.05.

### Phenotype definition

Number of unrelated individuals, hypertensive individuals, and individuals on antihypertensive medication

Exam 1 | Exam 2 | Exam 3 | Exam 4 | |
---|---|---|---|---|

Total* | 141 | 97 | 98 | 37 |

Hypertensive† | 42 | 49 | 52 | 27 |

Receiving meds | 27 | 30 | 45 | 25 |

where ${\u03f5}_{i,j}~N\left(0,{{\sigma}_{SBP,j}}^{2}\right)$, ${{\u03f5}_{i,j}}^{\prime}~N\left(0,{{\sigma}_{DBP,j}}^{2}\right)$, $\overline{ag{e}_{j}}=\frac{1}{{n}_{j}}\sum _{i=1}^{{n}_{j}}ag{e}_{i,j}$, and $i=1,\dots ,{n}_{j}$$({n}_{1}=141,{n}_{2}=97,{n}_{3}=98,{n}_{4}=37)$. This formulation of the age covariates reflects previous findings (see, eg, Ref. [4]) that SBP increases with age whereas DBP decreases after the age of 55 to 60 years, which can be approximated here with the sample mean age. We used the "survreg" function in the "survival" package of R to fit the censored regression models.

*j*= 1,2,3) separately for SBP and DBP, and took these mean residuals as our adjusted phenotypes.

### Bivariate copula modeling

*λ*

_{ L },

*λ*

_{ U }, respectively), which explain the amount of dependence between extreme values, and can give more insight in identifying pleiotropic variants. For the copula family in (6), these dependence measures become [1]

We obtain MLEs of the marginal parameters $\alpha =\left({\alpha}_{0},{\alpha}_{1},{\sigma}_{1}\right)$, $\beta =\left({\beta}_{0},{\beta}_{1},{\sigma}_{2}\right)$ in equation (4) and the copula parameters $\psi =\left(\phi ,\theta \right)$ in equation (6) by maximizing the likelihood function [1] with the general optimization software implemented in the nlm function in R. Variance estimates for the MLEs are obtained from the inverse of the observed information matrix.

To address aim (a) concerning the marginal association of a variant with each SBP and DBP under the bivariate model (5), we test the null hypotheses ${H}_{0}:{\alpha}_{1}=0$ (vs. ${H}_{A}:{\alpha}_{1}\ne 0$) and ${H}_{0}:{\beta}_{1}=0$ (vs. ${H}_{A}:{\beta}_{1}\ne 0$) with the large sample Wald test statistic. We expect improved inference under the bivariate model compared to inference obtained by separate analysis of SBP and DBP, which we refer to as the working independence model.

In contrast, for aim (b), which is to identify a variant that explains association between SBP and DBP, the copula model dependence parameters $\phi $ and $\theta $, and dependence measures (7) are of interest. We compare estimates of Kendall's $\tau \phantom{\rule{0.3em}{0ex}}{\lambda}_{L}$ and ${\lambda}_{U}$ under the full bivariate model (5) that includes the genetic variant with the corresponding estimates obtained under the bivariate model without the variant (ie, the null model with ${H}_{0}:{\alpha}_{1}={\beta}_{1}=0$). According to the delta method, we construct a confidence interval (CI) for the dependence measures using large-sample standard errors. When the CIs for a given association measure under the null and the full model do not overlap, we conclude that the given variant is pleiotropic. Use of CIs in this way is quite conservative. We also check whether the CI for ${\lambda}_{L}$ or ${\lambda}_{U}$ under the full model includes 0. Note that the copula model (6) only becomes an independent copula $C\left({u}_{1},{u}_{2}\right)={u}_{1}{u}_{2}$ when $\theta =1$ and $\phi $ goes to 0. However, because it is practically impossible to identify all variants, instead of testing independence, we search for variants that reduce the magnitude of the overall dependence measures, such as Kendall's $\tau $.

## Results and discussion

*p*values of the MLE estimates of the coefficients ${\alpha}_{1}$, ${\beta}_{1}$ in (4) for testing ${H}_{0}:{\alpha}_{1}=0$ and ${H}_{0}:{\beta}_{1}=0$ under two models: the working independence model and the bivariate copula model (6) for single-variant analysis. We observed some variants, including less common (0.05 ≤ MAF ≤0.10) and more common (MAF >0.10) variants, that are identified by both models, but the

*p*values for testing ${H}_{0}:{\alpha}_{1}=0$ and ${H}_{0}:{\beta}_{1}=0$ under the copula model (with minimum

*p*values 1.7 × 10

^{−4}and 5.1 × 10

^{−5}, respectively) are smaller than the

*p*values under the working independence model (with minimum

*p*values 5.5 × 10

^{−3}and 7.0 × 10

^{−4}, respectively). This includes variants significantly associated with both BP phenotypes under the joint copula model, although they are not significantly associated at the 1% level with either under the univariate phenotypic models (see Table 2). Overall, the estimated genetic effect sizes are larger and the estimated standard errors are slightly smaller under the bivariate model.

Results of testing ${\mathbf{\text{H}}}_{0}:{\alpha}_{1}=0$ or ${\mathbf{\text{H}}}_{0}:{\beta}_{1}=0$ for variant at 41,984,243 base-pair position

SBP | DBP | |||
---|---|---|---|---|

Working independence | Bivariate copula | Working independence | Bivariate copula | |

Coefficient Estimate (SE) | 11.1 (5.3) | 16.7 (4.7) | 7.2 (2.9) | 8.4 (2.7) |

| 4.0 × 10 | 3.8 × 10 | 1.5 × 10 | 1.6 × 10 |

*ULK4*, Kendall's tau and lower tail dependence do not differ markedly. We observe that without conditioning on any variant $({H}_{0}:{\alpha}_{1}={\beta}_{1}=0)$, the 2 phenotypes are moderately correlated with a Kendall's tau estimate of 0.578, and higher lower tail dependence than upper tail dependence (Table 3). Conditioning on the variant at 41,984,243 base-pair position diminishes upper tail dependence measure ${\lambda}_{U}$ at 0.01 level of significance (99% CI for ${\lambda}_{U}$ includes 0); this variant is also associated with SBP and DBP (see Table 2). Figure 2 illustrates how it achieves a reduction in upper tail dependence. Tail dependence can also be reduced in the absence of strong marginal BP associations. For example, the variant at 41,971,559 base-pair position is only modestly associated with SBP (

*p*value = 0.040) and DBP (

*p*value = 0.055), but the upper tail dependence is reduced from 0.449 obtained under the null model to 0.289 with a CI that includes 0 (Table 3).

Estimates of dependence measures $\left(\tau ,{\lambda}_{L},{\lambda}_{u}\right)$ under the null model ${H}_{0}:{\alpha}_{1}={\beta}_{1}=0$ and under model (5) in a single-variant analysis

Variant | $\hat{\tau}$ (SE) | (99% CI for τ) | $\hat{{\lambda}_{L}}$ (SE) | (99% CI for ${\lambda}_{L}$) | $\hat{{\lambda}_{U}}$ (SE) | (99% CI for ${\lambda}_{u}$) |
---|---|---|---|---|---|---|

Null | 0.578 (0.05) | (0.449, 0.707) | 0.650 (0.07) | (0.470, 0.830) | 0.449 (0.10) | (0.191, 0.707) |

41,984,243 | 0.555 (0.06) | (0.400, 0.710) | 0.703 (0.06) | (0.548, 0.858) | 0.268 (0.21) | (0.000, 0.809) |

41,971,559 | 0.570 (0.05) | (0.441, 0.699) | 0.715 (0.06) | (0.560, 0.870) | 0.289 (0.18) | (0.000, 0.753) |

## Conclusions

In this report, we demonstrate how to model the bivariate distribution of SBP and DBP with copulas and conduct appropriate inference for genetic association. The proposed method is shown to be applicable by considering a single gene, and crude estimates of computation time suggest that it is feasible to process 1 million variants in less than a day, for example, by using one hundred 2.5-GHz cores. Although estimating the bivariate distribution is computationally more intensive than fitting the working independence model, given the high correlation between the phenotypes, a potential advantage is that genetic associations can be detected with higher power under a plausible joint model. Using joint copula models, we were also able to identify candidate variants explaining the upper tail dependence of SBP and DBP. We generally observed strong linkage disequilibrium between variants identified. By conducting joint analyses of multiple variants in moderate linkage disequilibrium, we achieved a much more significant reduction in upper tail dependence (data not shown), and although we observed some reduction in point estimates of lower tail dependence and Kendall's tau, the CIs still overlap with those under the null model. Calling a comparison significant when the CIs fail to overlap is a conservative approach, but it is computationally efficient. As an alternative, a nonparametric bootstrap procedure could be used to estimate the variance of the estimated difference between dependence measures under the null and full models, and to construct an approximate CI. To allow multiple testing adjustments, instead of checking whether the CI for ${\lambda}_{L}$ or ${\lambda}_{U}$ under the full model includes 0, it would be desirable to test each of the null hypotheses ${H}_{0}:\phi =0$ or ${H}_{0}:\theta =1$, respectively, to obtain *p* values [5].

In principle, the extension of our approach to 3 or more quantitative traits is straightforward; however, the copula model (6) may not be ideal in this setting. It involves some restrictions on the association structure, and generally the Gaussian copula is used when there are 3 or more traits. The approach could also be extended to binary traits, but with some caution because there is no unique copula identifying the joint distribution function of discrete variables [6].

## Notes

## Declarations

### Acknowledgements

YEY was supported by a MITACS Network Industrial Fellowship, the Syd Cooper Fund and is a CIHR Fellow in Genetic Epidemiology and Statistical Genetics with CIHR STAGE (Strategic Training for Advanced Genetic Epidemiology). This work was supported in part by grants from the MITACS Network of Centres of Excellence in Mathematical Sciences and the Natural Sciences and Engineering Research Council of Canada. The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.

This article has been published as part of *BMC Proceedings* Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.

## Authors’ Affiliations

## References

- Joe H: Multivariate Models and Multivariate Dependence Concepts. London, Chapman & Hall. 1997Google Scholar
- Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, Chasman DI, Smith AV, Tobin MD, Verwoert GC, Hwang S, et al: Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature. 2011, 478: 103-109. 10.1038/nature10405.View ArticlePubMedGoogle Scholar
- Tobin MD, Sheehan NA, Scurrah KJ, Burton PR: Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure. Stat Med. 2005, 24: 2911-2935. 10.1002/sim.2165.View ArticlePubMedGoogle Scholar
- Sesso HD, Stampfer MJ, Rosner B, Hennekens CH, Gaziano JM, Manson JE, Glynn RJ: Systolic and diastolic blood pressure, pulse pressure, and mean arterial pressure as predictors of cardiovascular disease risk in men. Hypertension. 2000, 36: 801-807. 10.1161/01.HYP.36.5.801.View ArticlePubMedGoogle Scholar
- Yilmaz YE, Lawless JF: Likelihood ratio procedures and tests of fit in parametric and semiparametric copula models with censored data. Lifetime Data Anal. 2011, 17: 386-408. 10.1007/s10985-011-9192-2.View ArticlePubMedGoogle Scholar
- Genest C, Neslehova J: A primer on copulas for count data. ASTIN Bull. 2007, 37: 475-515. 10.2143/AST.37.2.2024077.View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.