Testing optimally weighted combination of variants for hypertension

Testing rare variants directly is possible with next-generation sequencing technology. In this article, we propose a sliding-window-based optimal-weighted approach to test for the effects of both rare and common variants across the whole genome. We measured the genetic association between a disease and a combination of variants of a single-nucleotide polymorphism window using the newly developed tests TOW and VW-TOW and performed a sliding-window technique to detect disease-susceptible windows. By applying the new approach to unrelated individuals of Genetic Analysis Workshop 18 on replicate 1 chromosome 3, we detected 3 highly susceptible windows across chromosome 3 for diastolic blood pressure and identified 10 of 48,176 windows as the most promising for both diastolic and systolic blood pressure. Seven of 9 top variants influencing diastolic blood pressure and 8 of 9 top variants influencing systolic blood pressure were found in or close to our top 10 windows.


Background
Hypertension is a common chronic destructive disease with unknown complex etiology [1]. More than1billion people worldwide have hypertension, defined as blood pressure (BP) ≥140 mm Hg systolic (SBP) or ≥90 mm Hg diastolic (DBP) [2], which is a major risk factor for stroke, myocardial infarction, heart failure, and a cause of chronic kidney disease [3][4][5]. Both genetic and environmental bases are likely to contribute to this disease. Ehret et al. conducted a large-scale genome-wide association study of hypertension in 2011 and identified 10 novel loci related to BP physiology [6]. Although numerous common genetic variants with small effects on BP have been identified [6][7][8], the identified variants account for only a small fraction of disease heritability [9]. One potential source of missing heritability is the contribution of rare variants. Recently, next-generation sequencing technologyhas enabled the sequencing of the whole genome of large groups of individuals,which makes directly testing rare variants feasible. The Genetic Analysis Workshop 18 (GAW18) data, which consists of a whole genome sequencingdata set, is a large-scale pedigree-based sample with 959 individuals, 464 directly sequenced and the rest imputed.
Several statistical methods have been proposed to detect associations of rare variants, including the combined multivariate and collapsing (CMC) method [10] and the weighted sum statistic (WSS) [11]. We have proposed a novel test for measuringthe effect of an optimally weighted combination of variants (TOW) [12]. In addition, based on the TOW, we proposed a variable weight-TOW (VW-TOW) aiming to test effects of both rare and common variants. Both TOW and VW-TOW are applicable to quantitative and qualitative traits, allow covariates, and are robust to directions of effects of causal variants.
In this article, we report a novel whole genome sliding window approach to detect genetic association between a trait and single-nucleotide polymorphism (SNP) regions across the entire genome. This approach integrates TOW and VW-TOW with the concept of sliding window [13]. Applied to the GAW18 replication 1, chromosome 3 data set, our approach yielded results consistent with the top genes influencing simulated SBP and DBP, which were generated from the GAW18 simulation model.

Methods
Consider a sample of n individuals. Each individual has been genotyped at M variants in a genomic region. Denote y i as the quantitative trait value. Denote X i = (x i1 , ..., x iM ) T as the genotypic score of the i th individual, where x im ∈ {0, 1, 2} is the number of minor alleles that the i th individual has at the m th variant.
Suppose we have p covariates. Let (z i1 , ..., z ip ) T denote covariates of the i th individual. We adjust both trait value y i and genotypic score x im for the covariates by applying linear regressions. That is, y i = α 0 + α 1 z i1 + ... + α p z ip + i and x im = α 0m + α 1m z i1 + ... + α pm z ip + τ im.
Letỹ i andx im denote the residuals of y i and x im, respectively. DenoteX i = (x i1 , . . . ,x iM ) as the residuals of the genotypic score of the i th individual.
Using the generalized linear model (GLM) to model the relationship between trait values and genotypes is equivalent to modeling the relationship between the residuals of trait values and the residuals of genotypes through GLM (1), where g() is a monotone "link" function.
Under the GLM, the score test statistic to test the null hypothesis H 0 : β = 0 is given by The statistic S asymptotically follows a chi-square distribution with k = rank(V) degrees of freedom (DF). For rare variants, however, the score test may lose power as a result of the sparse data and a large DFk. In rare variants association studies, to test for the effect of the weighted combination of variants, w m x im , the score test statistic becomes Because rare variants are essentially independent, we have Then, the score test statistic is approximately equal to As a function of (u 1 , · · · , u M ), S 0 (w 1 , · · · w M ) reaches its maximum when u m = a m or We denote w o m as the optimal weight which is given by w o We propose the new test statistic TOW to test the effect of the optimally weighted combination of variants The optimal weight w o m will put big weights to the variants that have strong associations with the traits of interest and adjust the direction of the association. Also, w o m will put big weights to rare variants. TOW targets rare variants and will lose power when testing for the effect of both rare and common variants. For testing the effects of both rare and common variants, we propose a new statistic, VW-TOW. We divide variants into rare (minor allele frequency [MAF] <the rare variant threshold [RVT]) and common (MAF > RVT), and apply TOW to the rare and common variants separately.
Define the test statistic of VW-TOW as T r , T r and T c denote the test statistics of TOW for rare and common variants, respectively. Here, we evaluate the minimization by dividing the interval [0, 1] into K subintervals of equal-length. Let λ k = k/K for k = 0, 1, · · · , K. Then, min 0≤λ≤1 p λ = min 0≤k≤K p λ k . We , where I() is the indicator function. We use TOW and VW-TOW to analyze the data set of unrelated individuals of GAW18 replication 1 on chromosome 3. To apply TOW and VW-TOW to the entire chromosome 3, we propose a sliding-window approach [13]. To use sliding windows, we divide all SNPs into contiguous windows and apply TOW and VW-TOW in each window. Suppose that we use windows with a window size of S, then, all the SNPs can be divided into windows: 1 to S, S+1 to 2S, 2S+1 to 3S, and so on.
To analyze the data set of GAW18 replication 1, chromosome 3 for unrelated individuals, we set the window size as 20. First we performed quality control tests for the genotype data with the PLINK toolset. We used 10,000,000 permutations to evaluate the empirical p values of TOW for DBP and SBP data, and 100,000 permutations to evaluate the empirical p values of VW-TOW for DBP and SBP data. Becausethe sample of unrelated individuals in GAW18 is relatively small, it is not so reasonable to claim the significance either by the falsediscovery rate or by the Bonferroni-corrected threshold. Therefore, we recommend the top 10 most promising windows with the smallest p values for follow-up studies.

Results
We applied TOW and VW-TOW incorporating the sliding window approaches to analyze the hypertension unrelated individuals'data set of GAW18. To facilitate comparisons among GAW18 contributions, we analyzed only replicate 1 on chromosome 3. To evaluate type I error rates of TOW and VW-TOW, we used all 200 replicates of simulated phenotype data. There are 157 unrelated individuals in the GAW18 pedigree-based sample. Among the 157 individuals, 142 have observations for SBP, DBP, and other demographic/clinical variables at exam 1. Our analysis was based on the 142 individuals and their genotypes, quantitative trait SBP, DBP, and other characteristicsat exam 1.
The total genotyping rate in the 142 individuals is 0.9997. We did not find any duplicated samples or sample contamination. No individual was filtered out from the multidimensional scaling (MDS) analysis. Of the 1,215,399 SNPs on chromosome 3, we removed 251,892 completely missing SNPs and retained 963,507 SNPs for final analysis. Because SBP and DBP varied by sex and increased with age, age and sex were considered as covariates in this study.
We listed the top 10 most promising windows out of 48,176 windows across the entire chromosome 3. The top 8 windows all reside in gene MAP4, which is the most susceptible gene on chromosome 3 for hypertension. Seven of 9 top variants influencing DBP and 8 of 9 top variants influencing SBP on chromosome 3 were found in or close to our top windows. Tables 1 and 2 show the top 10 most promising windows by TOW that are associated with DBP and SBP, respectively. The p values of TOW in the top 3 windows of Table 1 are very small. SNP 3_47957996, 3_ 47956424, and 3_47957741 are the third, fourth, and ninth variants in Table 2 of the GAW18 answer sheet. They all fell into our third window in Table 1 and the first window in Table 2.
To evaluate the type I error rates of the proposed sliding window approach, we chose 100 blocks (20 variants in each block) from chromosome 3 that are far from causal variants. In each block, we applied TOW and VW-TOW to each of the 200 replicates to test association between genotypes and the trait DBP. We obtained 1 p value for each replicate and each block. Figure 1 shows the histograms of TOW and VW-TOW. The histograms indicate that the type I error rates of both TOW and VW-TOW are under control.

Discussion
In this article, we proposed a sliding-window-based optimal weighted approach to test for the effects of both rare and common variants across the whole genome. In  Table 2 of the answers of GAW18; WID, window ID. *The variants are provided in the Supplemental Table 1 of the answer sheet of GAW18.
each window, our recently developed TOW and VW-TOW were applied to test genetic association between a disease and a combination of variants. Then, we applied the method to unrelated individuals of GAW18 on replicate 1, chromosome 3. We detected 3 susceptible windows across chromosome 3 for DBP and identified 10 out of 48,176 windows as the most promising windows for DBP and SBP. Becausethis is a simulated  Table 2 of the answers of GAW18; WID, window ID.  dataset, it is possible that the other genes identified were not listed in the top 10 windows but are actually related to SBP or DBP.
In this study, we use each window of size 20 across the entire chromosome 3. How to choose an appropriate window size is a critical question. We evaluated the effect of window size by running window sizes at 30, 40, and 50, respectively. However, the power of TOW was not increased when using a larger window size. Although the power of VW-TOW was slightly increased when using a larger window size, no window can pass the entire chromosome 3 Bonferroni-corrected threshold. TOW and VW-TOW can be robust to population stratification by adjusting the first K principalcomponents (PCs) of genotypes at genomic markers as covariates when calculating the residuals of trait and of genotype matrix. In this GAW18 data analysis, we did not adjust for PCsbecausewe believed that population stratification was not severe in this data based on our MDS analysis.
To further assess our new approach, we compared the power of TOW, VW-TOW, CMC, and WSS to detect association between gene MAP4 and DBP. The MAP4 was split into 44 windows (blocks) with 20 variants in each window. In each window, we calculated the power of each method based on 200 replicates. The power comparisons based on phenotype measurement DBP are given in Figure 2. This figure shows that in most of the windows, TOW is the most powerful test; VW-TOW is the second most powerful test.