### Model

Let *A* be the primary disease locus of interest, with penetrance value *w*_{
ij
}for genotype *A*_{
i
}*A*_{
j
}. Under the null hypothesis, the *B* locus is neutral with respect to disease and the *A* locus accounts for all disease risk in this genetic region. The frequencies of genotypes in controls at the *A* locus are denoted *f*_{
c
}(*A*_{
i
}*A*_{
j
}) and *f*_{
p
}(*A*_{
i
}*A*_{
j
}) for patients. The expected genotype frequencies among patients are given by:

*f*_{
p
}(*A*_{
i
}*A*_{
j
}) *= w*_{
ij
}*f*_{
c
}(*A*_{
i
}*A*_{
j
})*/T*,

where T is a normalizing factor. No assumption of Hardy Weinberg proportions in controls or patients is needed. Rewriting Eq. (1a), the relative penetrance values are estimated as follows:

*w*_{
ij
}/*T = f*_{
p
}(*A*_{
i
}*A*_{
j
})*/f*_{
c
}(*A*_{
i
}*A*_{
j
}).

Adding the second neutral locus *B*, the expected two-locus genotype frequencies under the null are given by:

exp *f*_{
p
}(*A*_{
i
}*A*_{
j
}*B*_{
k
}*B*_{
l
}) = *w*_{
ij
}*f*_{
c
}(*A*_{
i
}*A*_{
j
}*B*_{
k
}*B*_{
l
})/*T*,

and substituting from Eq. (1b) gives

exp *f*_{
p
}(*A*_{
i
}*A*_{
j
}*B*_{
k
}*B*_{
l
}) = *f*_{
c
}(*A*_{
i
}*A*_{
j
}*B*_{
k
}*B*_{
l
}) [*f*_{
p
}(*A*_{
i
}*A*_{
j
})*/f*_{
c
}(*A*_{
i
}*A*_{
j
})].

### Conditional genotype method (CGM)

For a given *A*_{
i
}*A*_{
j
}genotype, the term in [.] in Eq. (2b) is constant, so under the null that the *A* locus is the only disease-predisposing gene in the region, the relative genotype frequencies at the *B* locus should show no heterogeneity between patients and controls. We term this the conditional genotype method (CGM), with analogy to the conditional haplotype method (CHM) discussed above. In both cases, statistical testing can be done via a chi-square test of heterogeneity, with all the associated caveats therein, Fisher's exact test, or a resampling approach. We have not pursued this approach in the current analyses.

### Overall conditional genotype method (OCGM)

If the effect of an additional disease-predisposing gene in the region, e.g., the *B* gene or one in LD with it, is not specific to a particular haplotype or genotype at the primary disease locus *A*, then more power should be available by considering *B* locus genotypes combined over all *A* locus genotypes. In this case, the expected genotype frequencies at the *B* locus under the null hypothesis, assuming that *A* (disease locus) segregates for *m* alleles and *B* (putative neutral locus) for *n* alleles, are given by:

\mathrm{exp}{f}_{p}({B}_{h}{B}_{k})={\displaystyle \sum _{i=1}^{m}{\displaystyle \sum _{j=i}^{m}\frac{{f}_{p}({A}_{i}{A}_{j})}{{f}_{c}({A}_{i}{A}_{j})}}}{f}_{c}({A}_{i}{A}_{j}{B}_{h}{B}_{k}).

(3)

The question of statistical testing then arises. Application of a standard test of homogeneity of the *B* genotype observed (obs) and expected (exp) numbers does not give a chi-square distribution, in fact the distribution is exponential. This is due to the use of the *fp*(*A*_{
i
}*A*_{
j
})*/fc*(*A*_{
i
}*A*_{
j
}) ratio in the estimation of expected values. This observation led to the test statistic suggested below [12]. Note also that the use of low-frequency control genotype frequencies (*fc*(*A*_{
i
}*A*_{
j
})) could be problematic notwithstanding and these should be left out of any analyses, although this was not necessary in the current analysis of simulated data.

We propose the following test statistic when comparing exp *f*_{
p
}(*A*_{
i
}*A*_{
j
}*B*_{
k
}*B*_{
l
}) to the observed value *f*_{
p
}(*A*_{
i
}*A*_{
j
}*B*_{
k
}*B*_{
l
}) in our statistical testing. For consistency we will refer to the latter as obs *f*_{
p
}(*A*_{
i
}*A*_{
j
}*B*_{
k
}*B*_{
l
}):

{\chi}_{2df}^{2}=\frac{Np+Nc}{2}{\displaystyle \sum _{h,k}{c}_{hk}}

(4)

where:

{c}_{hk}=\frac{{(\mathrm{exp}{f}_{p}({B}_{h}{B}_{k})-obs\phantom{\rule{05.em}{0ex}}{f}_{p}({B}_{h}{B}_{k}))}^{2}}{4(\mathrm{exp}{f}_{p}({B}_{h}{B}_{k})+obs\phantom{\rule{05.em}{0ex}}{f}_{p}({B}_{h}{B}_{k}))},

and exp *f*_{
p
}(*B*_{
h
}*B*_{
k
}) is given by Eq. (3). Note that no LD or haplotype estimates are needed. With respect to analyses of the GAW15 Problem 3 data we discuss power issues in the Results section below and show that this test statistic has the nominal *p*-value for markers with no effect on disease and not in LD with loci C and D.