Volume 8 Supplement 1

## Genetic Analysis Workshop 18

# Testing optimally weighted combination of variants for hypertension

- Xingwang Zhao†
^{2}, - Qiuying Sha†
^{1}, - Shuanglin Zhang
^{1}and - Xuexia Wang
^{2}Email author

**8(Suppl 1)**:S59

**DOI: **10.1186/1753-6561-8-S1-S59

© Zhao et al.; licensee BioMed Central Ltd. 2014

**Published: **17 June 2014

## Abstract

Testing rare variants directly is possible with next-generation sequencing technology. In this article, we propose a sliding-window-based optimal-weighted approach to test for the effects of both rare and common variants across the whole genome. We measured the genetic association between a disease and a combination of variants of a single-nucleotide polymorphism window using the newly developed tests TOW and VW-TOW and performed a sliding-window technique to detect disease-susceptible windows. By applying the new approach to unrelated individuals of Genetic Analysis Workshop 18 on replicate 1 chromosome 3, we detected 3 highly susceptible windows across chromosome 3 for diastolic blood pressure and identified 10 of 48,176 windows as the most promising for both diastolic and systolic blood pressure. Seven of 9 top variants influencing diastolic blood pressure and 8 of 9 top variants influencing systolic blood pressure were found in or close to our top 10 windows.

## Background

Hypertension is a common chronic destructive disease with unknown complex etiology [1]. More than1billion people worldwide have hypertension, defined as blood pressure (BP) ≥140 mm Hg systolic (SBP) or ≥90 mm Hg diastolic (DBP) [2], which is a major risk factor for stroke, myocardial infarction, heart failure, and a cause of chronic kidney disease [3–5]. Both genetic and environmental bases are likely to contribute to this disease. Ehret *et al*. conducted a large-scale genome-wide association study of hypertension in 2011 and identified 10 novel loci related to BP physiology [6]. Although numerous common genetic variants with small effects on BP have been identified [6–8], the identified variants account for only a small fraction of disease heritability [9]. One potential source of missing heritability is the contribution of rare variants. Recently, next-generation sequencing technologyhas enabled the sequencing of the whole genome of large groups of individuals,which makes directly testing rare variants feasible. The Genetic Analysis Workshop 18 (GAW18) data, which consists of a whole genome sequencingdata set, is a large-scale pedigree-based sample with 959 individuals, 464 directly sequenced and the rest imputed.

Several statistical methods have been proposed to detect associations of rare variants, including the combined multivariate and collapsing (CMC) method [10] and the weighted sum statistic (WSS) [11]. We have proposed a novel test for measuringthe effect of an optimally weighted combination of variants (TOW) [12]. In addition, based on the TOW, we proposed a variable weight-TOW (VW-TOW) aiming to test effects of both rare and common variants. Both TOW and VW-TOW are applicable to quantitative and qualitative traits, allow covariates, and are robust to directions of effects of causal variants.

In this article, we report a novel whole genome sliding window approach to detect genetic association between a trait and single-nucleotide polymorphism (SNP) regions across the entire genome. This approach integrates TOW and VW-TOW with the concept of sliding window [13]. Applied to the GAW18 replication 1, chromosome 3 data set, our approach yielded results consistent with the top genes influencing simulated SBP and DBP, which were generated from the GAW18 simulation model.

## Methods

Consider a sample of $n$ individuals. Each individual has been genotyped at $M$ variants in a genomic region. Denote ${y}_{i}$ as the quantitative trait value. Denote ${X}_{i}={\left({x}_{i1},...,{x}_{iM}\right)}^{T}$ as the genotypic score of the *i*^{th} individual, where ${x}_{im}\in \left\{0,1,2\right\}$ is the number of minor alleles that the *i*^{th} individual has at the *m*^{th} variant.

Suppose we have $p$ covariates. Let ${\left({z}_{i1},...,{z}_{ip}\right)}^{T}$ denote covariates of the *i*^{th} individual. We adjust both trait value ${y}_{i}$ and genotypic score ${x}_{im}$ for the covariates by applying linear regressions. That is, ${y}_{i}={\alpha}_{0}+{\alpha}_{1}{z}_{i1}+...+{\alpha}_{p}{z}_{ip}+{\u03f5}_{i}$ and ${x}_{im}={\alpha}_{0m}+{\alpha}_{1m}{z}_{i1}+...+{\alpha}_{pm}{z}_{ip}+{\tau}_{im}$.

Let ${\u1ef9}_{i}$ and ${\stackrel{\sim}{x}}_{im}$ denote the residuals of ${y}_{i}$ and ${x}_{im}$, respectively. Denote ${\stackrel{\sim}{X}}_{i}=\left({\stackrel{\sim}{x}}_{i1},\dots ,{\stackrel{\sim}{x}}_{iM}\right)$ as the residuals of the genotypic score of the ${i}^{th}$ individual.

*k*. In rare variants association studies, to test for the effect of the weighted combination of variants, ${x}_{i}={\displaystyle \sum _{m=1}^{M}}{w}_{m}{x}_{im}$, the score test statistic becomes

Let ${a}_{m}=\frac{{\sum}_{i=1}^{n}\left({\u1ef9}_{i}-\overline{\u1ef9}\right)\left({x}_{im}-\overline{\stackrel{\sim}{x}}\right)}{\sqrt{{\sum}_{i=1}^{n}{\left({x}_{im}-{\overline{\stackrel{\sim}{x}}}_{m}\right)}^{2}}}$ and ${u}_{m}={w}_{m}\sqrt{{\sum}_{i=1}^{n}{\left({\stackrel{\sim}{x}}_{im}-{\overline{\stackrel{\sim}{x}}}_{m}\right)}^{2}}$.

Then, the score test statistic is approximately equal to ${S}_{0}\left({w}_{1},\cdots \phantom{\rule{0.3em}{0ex}},{w}_{M}\right)=n\frac{{\left({\sum}_{m=1}^{M}{a}_{m}{u}_{m}\right)}^{2}}{{\sum}_{i=1}^{n}{\left({\u1ef9}_{i}-\overline{\u1ef9}\right)}^{2}{\sum}_{m=1}^{M}{u}_{m}^{2}}.$

As a function of $\left({u}_{1},\cdots \phantom{\rule{0.3em}{0ex}},{u}_{M}\right)$, ${S}_{0}\left({w}_{1},\cdots {w}_{M}\right)$ reaches its maximum when ${u}_{m}={a}_{m}$ or ${w}_{m}={\sum}_{i=1}^{n}\left({\u1ef9}_{i}-\overline{\u1ef9}\right)\left({\stackrel{\sim}{x}}_{im}-{\overline{\stackrel{\sim}{x}}}_{m}\right)/{\sum}_{i=1}^{n}{\left({\stackrel{\sim}{x}}_{im}-{\overline{\stackrel{\sim}{x}}}_{m}\right)}^{2}$$\left(m=1,\cdots \phantom{\rule{0.3em}{0ex}},M\right)$. We denote ${w}_{m}^{o}$ as the optimal weight which is given by ${w}_{m}^{o}={\sum}_{i=1}^{n}\left({\u1ef9}_{i}-\overline{\u1ef9}\right)\left({\stackrel{\sim}{x}}_{im}-{\overline{\stackrel{\sim}{x}}}_{m}\right)/{\sum}_{i=1}^{n}{\left({\stackrel{\sim}{x}}_{im}-{\overline{\stackrel{\sim}{x}}}_{m}\right)}^{2}$. Let ${\stackrel{\sim}{x}}_{i}^{o}={\sum}_{m-1}^{M}{w}_{m}^{o}{\stackrel{\sim}{x}}_{im}$. Then ${S}_{0}\left({w}_{1}^{o},\cdots \phantom{\rule{0.3em}{0ex}},{w}_{M}^{o}\right)=n{\sum}_{i=1}^{n}\left({\u1ef9}_{i}-\overline{\u1ef9}\right)\left({\stackrel{\sim}{x}}_{1}^{o}-{\overline{\stackrel{\sim}{x}}}^{o}\right)/{\sum}_{i=1}^{n}{\left({\u1ef9}_{i}-\overline{\u1ef9}\right)}^{2}.$ We propose the new test statistic TOW to test the effect of the optimally weighted combination of variants ${\sum}_{m=1}^{M}{w}_{m}^{o}{\stackrel{\sim}{x}}_{im}$ as ${T}_{T}={\sum}_{i=1}^{n}\left({\u1ef9}_{i}-\overline{\u1ef9}\right)\left({\stackrel{\sim}{x}}_{1}^{o}-{\overline{\stackrel{\sim}{x}}}^{o}\right)$. ${T}_{T}$ is equivalent to ${S}_{0}\left({w}_{1}^{o},\cdots {w}_{M}^{o}\right)$ since ${\sum}_{i=1}^{n}{\left({\u1ef9}_{i}-\overline{\u1ef9}\right)}^{2}$ is a constant. The optimal weight ${w}_{m}^{o}$ will put big weights to the variants that have strong associations with the traits of interest and adjust the direction of the association. Also, ${w}_{m}^{o}$ will put big weights to rare variants. TOW targets rare variants and will lose power when testing for the effect of both rare and common variants. For testing the effects of both rare and common variants, we propose a new statistic, VW-TOW. We divide variants into rare (minor allele frequency [MAF] <the rare variant threshold [RVT]) and common (MAF > RVT), and apply TOW to the rare and common variants separately.

Define the test statistic of VW-TOW as ${T}_{VW-T}={\mathsf{\text{min}}}_{0\le \lambda \le 1}{p}_{\lambda}$, where ${p}_{\lambda}$ is the *p* value of ${T}_{\lambda}$.${T}_{\lambda}=\lambda \frac{{T}_{r}}{\sqrt{var\left({T}_{r}\right)}}+\left(1-\lambda \right)\frac{{T}_{c}}{\sqrt{var\left({T}_{c}\right)}}$, ${T}_{r}$ and ${T}_{c}$ denote the test statistics of TOW for rare and common variants, respectively. Here, we evaluate the minimization by dividing the interval [0, 1] into $K$ subintervals of equal-length. Let ${\lambda}_{k}=k/K$ for $k=0,1,\cdots \phantom{\rule{0.3em}{0ex}},K$. Then, ${\mathsf{\text{min}}}_{0\le \lambda \le 1}{p}_{\lambda}={\mathsf{\text{min}}}_{0\le k\le K}{p}_{{\lambda}_{k}}$.

We use permutation tests to evaluate *p* values of both ${T}_{T}$ and ${T}_{VW-T}$. To evaluate the *p* value of the test ${T}_{T}$, let ${T}_{T}^{0}$ denote the value of the test statistic based on the original data set. For each permutation, we randomly resample from residuals of trait values and denote the value of the test statistic based on the permuted data setby ${T}_{T}^{per}$. We perform the permutation procedure many times. Then the *p* value of the test is the proportion of the number of permutations with ${T}_{T}^{per}>{T}_{T}^{0}$. We permute $B$ times of permutations to evaluate the *p* value of ${T}_{VW-T}$. Let ${T}_{r}^{\left(b\right)}$ and ${T}_{c}^{\left(b\right)}$ denote the values of ${T}_{r}$ and ${T}_{c}$ based on the ${b}^{th}$ permuted data, where $b=0$ represents the original data. Based on ${T}_{r}^{\left(b\right)}$ and ${T}_{c}^{\left(b\right)}$$\left(b=0,1,\cdots \phantom{\rule{0.3em}{0ex}},B\right)$, we can calculate ${T}_{{\lambda}_{k}}^{b}$ for $b=0,1,\cdots \phantom{\rule{0.3em}{0ex}},B$ and $k=0,1,\cdots \phantom{\rule{0.3em}{0ex}},K$, where $\mathsf{\text{var}}\phantom{\rule{0.25em}{0ex}}\left({T}_{r}\right)$ and $\mathsf{\text{var}}\phantom{\rule{0.25em}{0ex}}\left({T}_{c}\right)$ are estimated using ${T}_{r}^{\left(b\right)}$ and ${T}_{c}^{\left(b\right)}$$\left(b=1,\cdots \phantom{\rule{0.3em}{0ex}},B\right)$. Then, we transfer ${T}_{{\lambda}_{k}}^{\left(b\right)}$ to ${p}_{{\lambda}_{k}}^{\left(b\right)}$ by ${p}_{{\lambda}_{k}}^{\left(b\right)}=\frac{{\sum}_{i=0}^{B}I\left({T}_{{\lambda}_{k}}^{\left(i\right)}>{T}_{{\lambda}_{k}}^{\left(b\right)}\right)}{B}$, where $I\left(\right)$ is the indicator function. Let ${p}^{\left(b\right)}={\mathsf{\text{min}}}_{0\le k\le K}{p}_{{\lambda}_{k}}^{\left(b\right)}$. Then the *p* value of ${T}_{VW-T}$ is given by $\frac{{\sum}_{i=1}^{B}I\left({p}^{\left(b\right)}<{p}^{\left(0\right)}\right)}{B}$, where $I\left(\right)$ is the indicator function.

We use TOW and VW-TOW to analyze the data set of unrelated individuals of GAW18 replication 1 on chromosome 3. To apply TOW and VW-TOW to the entire chromosome 3, we propose a sliding-window approach [13]. To use sliding windows, we divide all SNPs into contiguous windows and apply TOW and VW-TOW in each window. Suppose that we use windows with a window size of S, then, all the SNPs can be divided into windows: 1 to S, S+1 to 2S, 2S+1 to 3S, and so on.

To analyze the data set of GAW18 replication 1, chromosome 3 for unrelated individuals, we set the window size as 20. First we performed quality control tests for the genotype data with the PLINK toolset. We used 10,000,000 permutations to evaluate the empirical *p* values of TOW for DBP and SBP data, and 100,000 permutations to evaluate the empirical *p* values of VW-TOW for DBP and SBP data. Becausethe sample of unrelated individuals in GAW18 is relatively small, it is not so reasonable to claim the significance either by the false-discovery rate or by the Bonferroni-corrected threshold. Therefore, we recommend the top 10 most promising windows with the smallest *p* values for follow-up studies.

## Results

We applied TOW and VW-TOW incorporating the sliding window approaches to analyze the hypertension unrelated individuals'data set of GAW18. To facilitate comparisons among GAW18 contributions, we analyzed only replicate 1 on chromosome 3. To evaluate type I error rates of TOW and VW-TOW, we used all 200 replicates of simulated phenotype data. There are 157 unrelated individuals in the GAW18 pedigree-based sample. Among the 157 individuals, 142 have observations for SBP, DBP, and other demographic/clinical variables at exam 1. Our analysis was based on the 142 individuals and their genotypes, quantitative trait SBP, DBP, and other characteristicsat exam 1.

The total genotyping rate in the 142 individuals is 0.9997. We did not find any duplicated samples or sample contamination. No individual was filtered out from the multidimensional scaling (MDS) analysis. Of the 1,215,399 SNPs on chromosome 3, we removed 251,892 completely missing SNPs and retained 963,507 SNPs for final analysis. Because SBP and DBP varied by sex and increased with age, age and sex were considered as covariates in this study.

*MAP4*, which is the most susceptible gene on chromosome 3 for hypertension. Seven of 9 top variants influencing DBP and 8 of 9 top variants influencing SBP on chromosome 3 were found in or close to our top windows. Tables 1 and 2 show the top 10 most promising windows by TOW that are associated with DBP and SBP, respectively. The

*p*values of TOW in the top 3 windows of Table 1 are very small. SNP 3_47957996, 3_ 47956424, and 3_47957741 are the third, fourth, and ninth variants in Table 2 of the GAW18 answer sheet. They all fell into our third window in Table 1 and the first window in Table 2.

Top 10 most promising windows associated with DBP

WID | Chr | Physical location | Empirical p | Empirical p | Gene | Reference variants |
---|---|---|---|---|---|---|

1 | 3 | 48117215,48121372 | 2.34 × 10 | 0.0005 |
| |

2 | 3 | 48063171,48068858 | 4.95 × 10 | 0.0005 |
| |

3 | 3 | 47957289,47961091 | 4.09 × 10 | 0.0006 |
| 3_47957996 3_47956424 3_47957741 |

4 | 3 | 48034051,48040240 | 1.42 × 10 | 0.001 |
| 3_48040284 3_48040283 |

5 | 3 | 48089115,48094079 | 2.06 × 10 | 0.001 |
| |

6 | 3 | 48005035,48009105 | 2.69 × 10 | 0.0015 |
| |

7 | 3 | 47929938,47935009 | 5.29 × 10 | 0.001 |
| |

8 | 3 | 47912703,47920240 | 9.06 × 10 | 0.001 |
| 3_47913455 |

9 | 3 | 4474736,4477687 | 0.036 | 0.071 |
| 3_45008742 |

10 | 3 | 56871312,56875674 | 0.03 | 0.058 |
| 3_56870810* |

Top 10 most promising windows associated with SBP

WID | Chr | Physical location | Empirical p | Empirical p | Gene | Reference variants |
---|---|---|---|---|---|---|

1 | 3 | 47957289,47961091 | 0.005 | 0.004 |
| 3_47957996 3_47956424 3_47957741 |

2 | 3 | 48034051,48040240 | 0.003 | 0.007 |
| |

3 | 3 | 47990787,47999337 | 0.01 | 0.013 |
| |

4 | 3 | 48040283,48046708 | 0.02 | 0.01 |
| 3_48040284 3_48040283 |

5 | 3 | 47912703,47920240 | 0.03 | 0.017 |
| 3_47913455 |

6 | 3 | 48121395,48126740 | 0.015 | 0.032 |
| |

7 | 3 | 47929938,47935009 | 0.025 | 0.04 |
| |

8 | 3 | 48063171,48068858 | 0.015 | 0.01 |
| |

9 | 3 | 58104877,58108614 | 0.01 | 0.031 |
| 3_58109162 |

10 | 3 | 15664089,15667215 | 0.039 | 0.011 |
| 3_15686693 |

*p*value for each replicate and each block. Figure 1 shows the histograms of TOW and VW-TOW. The histograms indicate that the type I error rates of both TOW and VW-TOW are under control.

## Discussion

In this article, we proposed a sliding-window-based optimal weighted approach to test for the effects of both rare and common variants across the whole genome. In each window, our recently developed TOW and VW-TOW were applied to test genetic association between a disease and a combination of variants. Then, we applied the method to unrelated individuals of GAW18 on replicate 1, chromosome 3. We detected 3 susceptible windows across chromosome 3 for DBP and identified 10 out of 48,176 windows as the most promising windows for DBP and SBP. Becausethis is a simulated dataset, it is possible that the other genes identified were not listed in the top 10 windows but are actually related to SBP or DBP.

In this study, we use each window of size 20 across the entire chromosome 3. How to choose an appropriate window size is a critical question. We evaluated the effect of window size by running window sizes at 30, 40, and 50, respectively. However, the power of TOW was not increased when using a larger window size. Although the power of VW-TOW was slightly increased when using a larger window size, no window can pass the entire chromosome 3 Bonferroni-corrected threshold.

TOW and VW-TOW can be robust to population stratification by adjusting the first $K$ principalcomponents (PCs) of genotypes at genomic markers as covariates when calculating the residuals of trait and of genotype matrix. In this GAW18 data analysis, we did not adjust for PCsbecausewe believed that population stratification was not severe in this data based on our MDS analysis.

*MAP4*and DBP. The

*MAP4*was split into 44 windows (blocks) with 20 variants in each window. In each window, we calculated the power of each method based on 200 replicates. The power comparisons based on phenotype measurement DBP are given in Figure 2. This figure shows that in most of the windows, TOW is the most powerful test; VW-TOW is the second most powerful test.

## Notes

## Declarations

### Acknowledgements

We thank Dr. Claire L. Simpson (funded by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health) for helpful PLINK format GAW18 genotype data. QS and SZ were supported by the National Human Genome Research Institute of the National Institutes of Health under Award Number R03HG006155. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.

This article has been published as part of *BMC Proceedings* Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.

## Authors’ Affiliations

## References

- Beevers G, Lip GYH, O'Brien E: ABC of Hypertension. London,BMJ Books. 2007Google Scholar
- Kearney PM, Whelton M, Reynolds K, Muntner P, Whelton PK, He J: Global burden of hypertension: analysis of worldwide data. Lancet. 2005, 365 (9455): 217-223. 10.1016/S0140-6736(05)17741-1.View ArticlePubMedGoogle Scholar
- World Health Organization: Global Health Risks: Mortality and Burden of Disease Attributable to Selected Major Risks. 2009, Geneva, WHO PressGoogle Scholar
- Lewington S, Clarke R, Qizilbash N, Peto R, Collins R, Prospective Studies Collaboration: Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies. Lancet. 2002, 360: 1903-1913.View ArticlePubMedGoogle Scholar
- Singer DR, Kite A: Management of hypertension in peripheral arterial disease: does the choice of drugs matter?. Eur J Vasc Endovasc Surg. 2008, 35 (6): 701-708. 10.1016/j.ejvs.2008.01.007.View ArticlePubMedGoogle Scholar
- International Consortium for Blood Pressure Genome-Wide Association Studies, Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, Chasman DI, Smith AV, Tobin MD, et al: Genetic variants in novel pathways influences blood pressure and cardiovascular disease risk. Nature. 2011, 478: 103-109. 10.1038/nature10405.View ArticleGoogle Scholar
- Newton-Cheh C, Johnson T, Gateva V, Tobin MD, Bochud M, Coin L, Najjar SS, Zhao JH, Heath SC, Eyheramendy S, et al: Genome-wide association study identifies eight loci associated with blood pressure. Nat Genet. 2009, 666-676. 10.1038/ng.361. 41Google Scholar
- Levy D, Ehret GB, Rice K, Verwoert GC, Launer LJ, Dehghan A, Glazer NL, Morrison AC, Johnson AD, Aspelund T, et al: Genome-wide association study of blood pressure and hypertension. Nat Genet. 2009, 41: 677-687. 10.1038/ng.384.PubMed CentralView ArticlePubMedGoogle Scholar
- McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, Hirschhorn JN: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008, 9: 356-369. 10.1038/nrg2344.View ArticlePubMedGoogle Scholar
- Li B, Leal SM: Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008, 83: 311-321. 10.1016/j.ajhg.2008.06.024.PubMed CentralView ArticlePubMedGoogle Scholar
- Madsen BE, Browning SR: A group-wise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009, 5: e1000384-10.1371/journal.pgen.1000384.PubMed CentralView ArticlePubMedGoogle Scholar
- Sha Q, Wang X, Wang XL, Zhang SL: Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genet Epidemiol. 2012, 36: 561-571. 10.1002/gepi.21649.View ArticlePubMedGoogle Scholar
- Yang H, Lin C, Fann C: A sliding-window weighted linkage disequilibrium test. Genet Epidemiol. 2006, 30: 531-545. 10.1002/gepi.20165.View ArticlePubMedGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.