For the ATS analysis of association, Zhang et al. [2] applied the HWDTT and the CATT to case-control studies. Song and Elston [6] and Zhang et al. [2] showed that these two statistics are asymptotically independent under the null hypothesis of no association. Therefore, they used all samples for both stages of the analysis. For the first stage of the proposed ATS analysis, the HWDTT is applied to test each SNP at the significance level α1 chosen on the basis of the conditional power of the HWDTT. The smallest α1 is chosen such that the power is at least 1 - β, where β is the type II error.
Denote the estimators of the genotype frequencies in cases and controls and for i = 0, 1, 2, so that and are estimators of the frequencies of the allele A in cases and controls. Song and Elston [6] considered the difference in disequilibrium coefficients between cases (D1) and controls (D0), where D1 = p2 - (p2 + p1/2)2 and D0 = q2 - (q2 + q2/2)2. The HWDTT statistic can be written as
where n
i
= (r
i
+ s
i
) and . The asymptotic power of the HWDTT can then be written as
where
f(a, b) = (1 - 2b - a)2b(1 - b) + 2ab(b + a/2)(1 - 2b - a) + (b + a/2)2a(1 - a),
= f(r' p1 + s' q1, r' p2 + s' q2)/(r's'), r' ≈ r/n and s' ≈ s/n,
= nVar(D1 - D0),
Φ is the distribution function of the standard normal N(0, 1), and
is the 100(1 - α1/2)th percentile of N(0, 1).
The SNPs for which the null hypotheses are rejected in the first stage are tested in the second stage analysis by the CATT at the level α2 = α'/(mα1), where α' is obtained by the parametric bootstrap to control the overall type I error rate of the ATS analysis. Then α2 controls the overall type I error rate to α (taken to be 0.05) for a total of m simultaneous hypothesis tests (SNPs). As in Van Steen et al. [3], the overall p-value of the ATS is the p-value of the second analysis, which here is the CATT.
Data
We used the GAW15 simulated Problem 3 data set for rheumatoid arthritis (RA), which includes 100 replicates. Each replicate contains 1500 families with an affected sibling pair and 2000 unaffected control subjects. To obtain a sample of cases and controls, we randomly chose one case from each affected sib pair. Thus, from each replicate we selected 1500 cases with RA and 2000 controls. In order to compare the performance of all methods in a small sample size, we randomly sampled 200 cases and 200 controls from each of the 100 replicate samples of 1500 cases and 2000 controls. To examine type I error rate, we concentrated on 100 markers that were at least 20,000 kb distant from the identified peaks on chromosome 6. Therefore, the total number of marker tests to examine type I error was 10,000 (100 markers × 100 replicates). To examine power, we concentrated on all 674 markers on chromosome 6 using only one replicate of 200 cases and 200 controls. Among these 674 markers, 5 markers are causative, in the region between 32447.149 kb and 37363.880 kb on chromosome 6.