Two-stage analysis strategy for identifying the IgM quantitative trait locus.

Genetic association studies offer an opportunity to find genetic variants underlying complex human diseases. Various tests have been developed to improve their power. However, none of these tests is uniformly best and it is usually unclear at the outset what test is best for a specific dataset. For example, Hotelling's T2 test is best for normally distributed data, but it can lose considerable power when normality is not met. To achieve satisfactory power in most cases, without compromising the overall significance level, we propose to adopt a two-stage adaptive analysis strategy - several statistics are compared on a portion of the samples at the first stage and the most powerful statistic is then used for the remaining samples. We evaluated this procedure by mapping the quantitative trait locus of IgM with the simulated data in Genetic Analysis Workshop 15 Problem 3. The results show that the gain in power of the two-stage adaptive analysis procedure could be considerable when the initial choice of test statistic is wrong, whereas the loss is relatively small in the case that the optimal test chosen initially is correct.


Background
Association studies currently offer an exciting approach to mapping complex quantitative trait loci (QTLs). Wallace et al. [1] recently recommended a generalized Hotelling's T 2 test for QTL linkage disequilibrium (LD) mapping, which is uniformly the best test for normally distributed data. However, if the assumption of a normal distribution is not met, T 2 may lose considerable power. When the trait distribution is unclear, some nonparametric tests may be preferred because they are only slightly less powerful than T 2 when the trait is normally distributed, but much more powerful than T 2 in some cases of non-normality. In general, it is unclear what test is the best when the trait distribution is unknown. Some investigators only report the most significant result from several statistics, but the type I error rate cannot be properly controlled when this is done. It is also not wise in this situation to use an approach such as the Bonfferoni method to control the type I error rate because the various tests are usually highly correlated and therefore the result will be overly conservative.
To achieve satisfactory power in most cases, without compromising the overall significance level, we consider adopting a two-stage adaptive analysis strategy: several statistics are compared on a portion of the samples in the first stage and the statistic that is found to be most powerful is then used for the remaining samples. Previously, two-stage strategies have been adopted in genetic association studies to reduce the cost of genotyping [2,3] or the penalty due to multiple testing when modeling gene × gene interactions [4]. Here, we apply this strategy for a different purpose: to select a powerful test for the data at hand and hence obtain good power overall. We evaluate this procedure of adaptively selecting the optimal test by mapping the IgM QTL with the simulated data of Problem 3 in Genetic Analysis Workshop 15 (GAW15).

Methods
The procedure examines the power of various statistics using a portion of the data in an exploratory first stage and then applies this most powerful test to the rest of the data in the second stage. The statistics from the two stages are combined to make full use of the information. This approach of combining the results of the two stages is equivalent to a more general method of combining p-values. For the procedure of combining these p-values to be valid, however, we need to specify before the analysis which statistic will be used to obtain the p-value (p 1 ) from the exploratory stage in the combination. The p-value from the second stage (p 2 ) is calculated based on the most powerful statistic found at the first stage. Under the null hypothesis, each p value is, at least asymptotically, distributed uniformly on U(0, 1). The final decision then depends on a combining function f(p 1 , p 2 ). The most common such function may be Fisher's combination test [5], which is defined by where under the null hypothesis Fisher's statistic will be distributed as a χ 2 with 4 degrees of freedom. Another example is the weighted inverse normal method, where Φ is the cumulative distribution function of a standard normal distribution, 0 <w i < 1 and w 1 2 + w 2 2 = 1. This statistic will be distributed as a standard normal distribution.
To obtain p 2 , we have to estimate the power of the various statistics at the exploratory first stage. Traditional power calculation methods require the trait distribution to be known, which is not the case here. A bootstrap method of using the data from the exploratory stage can be adopted to approximate the power [6,7]. The bootstrap and permutation are two often used nonparametric procedures. It is often desired to obtain "exact" p-values by employing a permutation procedure to generate the null distribution of the statistic that is used for a test. Here, on the other hand, we want to estimate the power of a statistic, and for this we need the distribution of the statistics under the alternative hypothesis; a permutation procedure cannot be directly applied for this purpose. Let the trait values of individuals with genotype g be denoted x g , where g = 0, 1, 2 for an additive SNP marker and g = 0, 1 for a recessive/ dominant marker. For this example, we assume a dominant model for the rarer allele. We denote the sample size for each genotype n g . We assume the distribution of trait values for different genotypes have similar shape, but the locations of the distributions are shifted by d g . The hypothesis to detect association between a marker and the trait is then defined as H 0 :d = 0. The power function of the statistic T for d = δ at the significance level α is then given by P(T; δ, α) The method of Collings and Hamilton [6] to approximate P(T; δ, α) by a nonparametric bootstrap procedure is as follows: 1. For each genotype group g, a random sample of 4. Finally, we estimate the power of the different statistics using the weighted average estimates of the different genotype groups, given by .
We compared non-adaptive methods and this adaptive method using the simulated data of Problem 3 in GAW15, which has 100 replicates. For an adaptive method, we n n g g = ∑ˆ( proportions of samples at the exploratory stage (π 1 ), different methods of combining tests (Fisher's and the Inverse normal methods) and two statistics (Hotelling's T 2 [1] and the nonparametric Wilcoxon statistic [8]). These statistics were calculated using the R package (version 2.4.1). In each replicate, we sampled 200 independent individuals to map the IgM QTL. To examine the validity of the various tests, we randomly selected from each of the 100 replicates 10 SNPs that are not associated with IgM and therefore from these results the type I error rate is given by

Results
From Table 1, we can see that the two-stage analysis procedure maintains as good a type I error rate as a one-stage analysis. Table 2 shows the empirical power for the different analysis strategies. The first two rows of Tables 1 and 2 are the results from applying each of the two tests to the whole data. Because the distribution of IgM clearly deviates from a normal distribution, the loss of power of Hotelling's T 2 turns out to be severe. The two-stage analysis obtains substantial gain in power by choosing the right statistic for the second stage from "learning" at the exploratory stage. This analysis shows that using 30% of the samples at the first stage gives a good prediction of the better analytic method to use in terms of power. The results also show that the difference between the two methods of combining p-values is small.

Discussion
Two-stage designs have been applied to large-scale genetic association studies to substantially reduce genotyping cost while maintaining power. In addition to the knowledge of which markers are promising, we can obtain information about the distribution of the phenotype based on the data from the exploratory stage. This knowledge is useful for the choice of a statistic to use at the second stage and can therefore lead to a considerable gain in power. In our analysis, we evaluated this idea by considering just two statistics. Hotelling's T 2 has been proved to be a powerful statistic, even with sample selection. However, the advantage of T 2 depends on the trait distribution. On the other hand, although a nonparametric statistic is not the most powerful one when normality of the trait holds, it usually works well. So it is reasonable to consider combining the p-value of a nonparametric statistic from the exploratory stage with the p-value of the most powerful statistic for the second stage.
The idea of a two-stage analysis can be further generalized in genetic association studies. Because LD patterns vary greatly, it is often unclear whether a single-marker analysis or a multiple-marker analysis or a haplotype-based analysis is most powerful for a specific data set. Further work on developing a data-driven adaptive procedure to choose the type of analysis to perform on the second stage data would be potentially useful.

Conclusion
The adaptive two-stage procedure can lead to considerable gain in power by guiding the choice of a test based on the knowledge learned from an exploratory stage. At the same time, the type I error rate can be well controlled.

Competing interests
The author(s) declare that they have no competing interests.