Notation and model
Consider a SNP with alleles D and d and frequencies p and q = 1 - p, respectively. In a case-control design, r cases and s controls are independently sampled from a population. The genotype counts of three genotypes G0 = dd, G1 = Dd, and G2 = DD are denoted as (r0, r1, r2) in cases and (s0, s1, s2) in controls, which follow multinomial distributions mul(r: p0, p1, p2) and mul(s: q0, q1, q2), respectively. Denote the disease prevalence as k and penetrances as f
i
= P(case|G
i
) for i = 0, 1, 2. By the Bayes Theorem, p
i
= g
i
f
i
/k and q
i
= g
i
(1 - f
i
)/(1 - k), where g
i
= P(G
i
). Without loss of generality, assume that D has high risk. Then the null hypothesis of no association can be stated as H0: f0 = f1 = f2 = k. The alternative hypothesis is H1: f0 ≤ f1 ≤ f2 with at least one inequality. The genotype relative risks (GRRs) are defined as λ1 = f1/f0 and λ2 = f2/f0. The recessive, additive, and dominant models are referred to as λ1 = 1, λ1 = (1 + λ2)/2, and λ1 = λ2, respectively [2–4].
Trend tests and robust tests
To test association using case-control data, the Cochran-Armitage trend test (CATT) has been proposed [2–4], which can be written as
(1)
where (x0, x1, x2) = (0, x, 1) and 0 ≤ x ≤ 1. Given x, Z
x
follows asymptotically N(0,1). The choice of x is 0, 1/2, and 1 for the recessive, additive/multiplicative, and dominant models, respectively [5]. In practice, however, the true genetic model is unknown. Hence the robust tests, maximin efficiency robust test (MERT) and maximum test (MAX), can be applied, which are given by MERT = (Z0 + Z1)/{2(1 + ρ)}1/2 and MAX = max(|Z0|, |Z1/2|, |Z1|), where ρ = [n0n2/{(n0 + n1)(n1 + n2)}]1/2 [4]. Note that Pearson's association test can also be used. However, Zheng et al. [6] showed that the MAX is often more powerful than the Pearson chi-squared test for a case-control design. Comparison of MERT and MAX can be found in Freidlin et al. [7]. The MAX and MERT have also been applied to other designs for GAW14 [8, 9].
Ranking markers with multiple statistics
When the genetic model is unknown, the three CATTs (Z0, Z1/2, Z2) are calculated for each of M SNPs. Then the p-values of MERT and MAX can be obtained for ranking. However, computing the p-value of MAX needs extensive simulation. Thus, alternatively, the minimum of the p-values (min p) of the three CATTs can be used for ranking. Rather than ranking M SNPs based on any single CATT, we propose ranking the SNPs by the MERT and the minimum of the p-values. We expect that ranking SNPs based on this approach would be more robust compared to ranking by a single CATT when the ranks by the three CATTs are quite different.