Genome-wide association studies of rheumatoid arthritis data via multiple hypothesis testing methods for correlated tests.

Genome-wide association studies often involve testing hundreds of thousands of single-nucleotide polymorphisms (SNPs). These tests may be highly correlated because of linkage disequilibrium among SNPs. Multiple testing correction ignoring the correlation among markers, as is done in the Bonferroni procedure, can cause loss of power. Several multiple testing adjustment methods accounting for correlations among tests have been developed and have shown improved power compared to the Bonferroni procedure. These methods include a Monte Carlo (MC) method and a method of computing p-values adjusted for correlated tests. The objective of this study is to apply these two multiple testing methods to genome-wide association study of the Genetic Analysis Workshop 16 rheumatoid arthritis data from the North American Rheumatoid Arthritis Consortium, to compare the performance of these two methods to the Bonferroni procedure in identifying susceptibility loci underlying rheumatoid arthritis, and to discuss the strengths and weaknesses of these methods. The results show that both the MC method and p-values adjusted for correlated tests method identified more significant SNPs, thus potentially have higher power than the corresponding Bonferroni methods using the same test statistics as in the MC method and p-values adjusted for correlated tests, respectively. Simulation studies demonstrate that the MC method may have slightly higher power than the p-values adjusted for correlated tests method.


Background
Genome-wide association studies (GWAS) for complex diseases involve multiple hypothesis testing. The Bonferroni procedure is commonly used to control familywise error rate (FWER) for multiple hypothesis testing. However, the Bonferroni procedure becomes more conservative as the number of hypotheses tested increases and the test statistics are correlated [1,2]. To handle the correlation among test statistics, a permutation method [3] was proposed based on estimation of the joint distribution of test statistics. However, this approach is computationally intensive and not Open Access appropriate for GWAS. Therefore, several efficient methods that can account for the correlation among test statistics have been developed for multiple testing [1,2]. Lin [2] proposed a Monte Carlo (MC) sampling approach based on approximating the joint distribution of test statistics. This method does not require repeated analyses of simulated datasets as in the permutation method, and therefore is much less computationally demanding. Conneely and Boehnke [1] proposed a method of computing p-values adjusted for correlated tests (p_ACT) by numerical integration of the asymptotic multivariate normal distribution of the test statistics. This approach is very computationally efficient and attains even greater speed. In this study we applied three multiple testing procedures, the Bonferroni procedure, MC method, and p_ACT method, to GWAS of the Genetic Analysis Workshop 16 (GAW16) rheumatoid arthritis (RA) data from the North American Rheumatoid Arthritis Consortium (NARAC). We compared the performance of these three procedures by simulation studies.

Methods
We describe the three multiple hypothesis testing procedures in the context of association studies. Suppose there are n individuals with m markers in the observed case-control data. We test m null hypotheses H 1 , H 2 , ..., H m for the m markers. The corresponding p-values are p 1 , p 2 , ..., p m . In the Bonferroni procedure, if p i ≤ a/m, then H i is rejected (i = 1, ..., m), where a is the pre-set significance level. While in the Bonferroni procedure all tests are assumed to be independent, the MC method and the p_ACT methods described below account for dependence among test statistics by considering the joint distribution of test statistics. All the three methods can control the FWER well.

MC method
The test statistic for the j th marker (corresponding to hypothesis H j ) is defined as . If the hypothesis H j is true, the statistic T j has approximately a c 2 distribution with d j degrees of freedom, where d j is the dimension of U j . For GWAS of the RA data in this article, we only consider an additive genetic model with d j = 1, and X ji = 0, 1, or 2, indicating the number of minor alleles in the genotype of individual i at locus j.
The test statistics (T 1 , T 2 , ..., T m ) may be correlated due to linkage disequilibrium (LD) among markers. The multiple testing procedure using the actual joint distribution of (T 1 , T 2 , ..., T m ) can be computationally intensive. The MC method provides an approach to approximate the actual joint distribution by MC sampling. The MC and G 1 , G 2 , ..., G n are independent standard normal random variables that are independent of the data, and then the method uses the joint distribution of T j values to approximate the joint distribution of T j values based on obtaining realizations from distributions of T j values by repeatedly generating the normal random samples G 1 , are T j variables corresponding to (t (1) , t (2) , ..., t (m) ), respectively. The MC method works as a step-down procedure as follows: starting with hypothesis H (1) , the method rejects H (j) , j = 1, 2, ..., m and removes the corresponding marker and variable T j .., H (j-1) have been tested and rejected. This probability is calculated based on a large number (e.g., 10,000) of realizations of the T j values. We have implemented this method by using the statistical package R [4].
p_ACT method Suppose the test statistics T = (T 1 , T 2 , ..., T m ) for m markers follow multivariate normal distribution N(0, Σ) asymptotically when all null hypotheses are true (i.e., no markers are associated with the disease), where 0 is an mdimensional vectors of zeros and Σ is a m × m correlation matrix. Let p min ≤ p (2) ≤ ... ≤ p (m) be the ordered p-values calculated from the observed data. The probability of observing at least one p-value as small as p min is for two sided tests, 1 1 2 where Z 1 , Z 2 , ..., Z m are random variables from the multivariate normal distribution N(0, Σ). Computation of p ACT requires integration of the multiple normal density function. The current version of the p_ACT method can handle integration of up to 1,000 dimensions (i.e., 1,000 markers) at one time [5]. Although any test statistics T with asymptotic multivariate normal distribution can be used for Eq. (2), Conneely and Boehnke [1] described the test statistic These test statistics (T 1 , T 1 , ..., T m ) asymptotically follow multivariate normal distribution N(0, R), where R = {r jk }, j = 1, ..., m, k = 1, ..., m, and r V V V . T h e p_ACT method also works as a step-down procedure: 1) If p ACT <a, then reject the null hypothesis associated with p min , and remove p min and the marker associated with p min . 2) Let p min = p (2) in Eq. (2), change m into m-1, and repeat Step 1. Continuing in this fashion for the remaining p (j) until for some p (k) , p ACT = a, then accept all remaining hypotheses (including that associated with p (k) ). This p_ACT method has been implemented in a computer program in R [6].
Partitioning of genome-wide single-nucleotide polymorphism (SNP) data As stated earlier, the p_ACT method can only handle up to 1,000 tests at a time, and the MC method can handle more than 1,000 tests but may become computationally intensive when the number of tests is very large. To apply these methods to GWAS of the RA data, we divided the whole genome into small blocks; each block includes hundreds or up to one thousand of SNPs. We assume that tests within each block are dependent, and that tests from different blocks are independent. To control the FWER at a for the whole genome, we apply Bonferroni procedure among blocks, that is, we assign a b to each block such that the sum of these a b equals a (i.e., ∑ a b = a), where a b is proportional to the number of SNPs within the block (i.e., block size). We applied the MC method and p_ACT method separately to each block and control FWER at a b for the tests within the block. In this study, we considered the block sizes of 100, 500, and 1,000 separately to evaluate the effect of block size on the performance of the MC method and p_ACT method.
Application to RA data The RA dataset contains 868 cases and 1,194 controls with 545,080 SNPs after removing duplicated and contaminated samples. If an individual has missing genotype at a marker, we imputed the most-frequent genotype observed in the data at that marker. We removed SNPs with minor allele frequencies (MAF) less than 0.01 and SNPs with p-values less than 1 × 10 -4 in Hardy-Weinberg equilibrium (HWE) test in controls. We did not consider SNPs on sex chromosomes. After these procedures, 515,050 SNPs remained in our analysis.
We applied the three multiple testing procedures to the GWAS of the RA dataset. First, we performed the association analysis on the RA dataset by using the test statistics in Eqs. (1) and (3) separately and obtained p-values for each SNP. We call these p-values raw p-values. For each test statistic, we applied the Bonferroni procedure to the p-values and set the nominal level of FWER as 0.05. The raw p-values calculated from statistic in Eq. (3) were used in the p_ACT method (see below). The MC method is based on the statistic in Eq. (1), and we set the number of replicates of the normal random samples G 1 , G 2 , ..., G n as 25,000. The p_ACT method is based on the test statistic in Eq. (3), and all tests are twosided. In the p_ACT method we set the limit on the number of simulations or integrand values in R function "pmvnorm" as maxpts = 25,000.

Simulation studies
To evaluate the performance of the two statistics in Eqs.
(1) and (3) and of three multiple testing methods, we simulated 10,000 replicated data sets in a manner similar to Lin [2]; each data set included N 1 cases and N 2 controls with 100 SNPs in a chromosomal region (or one block). We considered two sets of values of N 1 and N 2 : 1) N 1 = N 2 = 100, and 2) N 1 = 100, and N 2 = 150. For each individual, the simulated chromosomal region consisted five independent consecutive subregions. Each subregion has 20 biallelic SNPs in LD with coefficient of r 2 = 0.9 between two successive SNPs. At each SNP, HWE was assumed and MAF was 0.3. We chose one SNP in the first subregion as a disease SNP, and determined case/ control status based on an additive disease model with disease prevalence of 0.1 and genotype relative risk of 2. Table 1 shows the estimated FWER and power for the simulated datasets with nominal level of FWER = 0.05 (i.e., a = 0.05). The estimated FWER was calculated based on whether any SNP in any of the last four subregions was significant. The power was estimated based on whether any SNP in the first subregion is significant. In the situation N 1 = N 2 = 100, the two test statistics in Eqs. (1) and (3) with Bonferroni correction generated almost the same results (on FWER and power), and the MC method and p_ACT method also had nearly same results. In the situation N 1 = 100 and N 2 = 150, the test statistic in Eq. (1) had slightly higher power than that in Eq. (3), and consequently, the MC method had slightly higher power than the p_ACT method. In both situations, the MC method and p_ACT method had higher power than the Bonferroni methods using statistics in Eqs.

Results
(1) and (3), respectively. Table 2 reports the number of significant SNPs associated with RA on 22 chromosomes detected by the three procedures, where the overall significant level a across the whole genome was 0.05. By using the Bonferroni procedure, we identified 634 and 589 significant SNPs for the 22 chromosomes based on the statistics defined in Eqs.
(1) identified more significant SNPs. Table 2 also describes the number of significant SNPs on chromosomes 1, 6, 9, 16, and 22, which had more identified significant SNPs than other chromosomes. Based on the statistic in Eq.
(1), the MC method identified 667, 679, and 682 significant SNPs for the 22 chromosomes when the block sizes are 100, 500, and 1,000, respectively. These numbers of identified significant SNPs are greater than the 634 identified by the Bonferroni procedure using the same statistic. Similarly, based on the statistic in Eq. (3), the p_ACT method identified 611, 621, and 635 significant SNPs, when the block sizes were 100, 500, and 1,000, respectively. These numbers of identified significant SNPs are also greater than the 589 identified by the Bonferroni procedure using the statistic in Eq. (3). As the block size increased, the numbers of significant SNP identified by both the MC method and the p_ACT method increased. The MC method using the statistic in Eq.
(1) identified more significant SNPs than the p_ACT method using the statistic in Eq. (3).
We compared computing times of the MC method and p_ACT method. As an example, we only showed the times for chromosome 9. With block sizes of 100, 500, and 1,000, the MC method used about 6.24 hr, 2.89 hr, and 2.48 hr, while the p_ACT method used 0.27 hr, 0.47 hr, and 1.08 hr, respectively. The p_ACT method is faster than the MC method.

Discussion
We have applied two multiple testing methods (MC and p_ACT), which account for correlation among tests by splitting each chromosome into smaller blocks, to the GWAS of the RA dataset and then compared the results of these methods to those of the Bonferroni procedure. Both the MC method and p_ACT method identified more significant SNPs than the Bonferroni procedure. The numbers of significant SNPs identified by the MC and p_ACT methods increased as the block size increased.
The test statistic in Eq. (3) is transformed from a traditional score statistic from a generalized linear model. The essential difference between the statistics in Eqs.
(1) and (3) is that variance V j is estimated by different ways. Our simulation studies show that when the numbers of cases and controls are equal, the two test statistics have almost the same power, and that when the numbers of cases and controls are different, the test statistic in Eq.
(1) can have slightly higher power than that in Eq. (3), and consequently, the power of the MC method can be slightly higher than that of p_ACT method. Our simulation studies were only based on additive model, small sample sizes, and small number of SNPs. More extensive simulation studies are necessary in the future research.
In our analysis, we divided the whole genome into blocks with a fixed number of SNPs. We only accounted for LD within each block, and we assumed independence between tests from different blocks by ignoring LD between blocks. This assumption can cause loss of power. To avoid losing power, each entire chromosome may be treated as a block. However, the MC method will become computationally intensive or infeasible, and the p_ACT cannot handle more than 1,000 SNPs in each block. Another possible solution is to split each chromosome into blocks according to LD pattern, to group a set of consecutive SNPs in strong LD into one block and to ignore weak LD between blocks. As described earlier, the larger the block size we select, the higher the power we can obtain. This is an issue we will pursue in the future. Also we did not consider population stratification, which may cause spurious falsepositive results.