### Algorithm

This algorithm is an iterative process of selecting and analyzing subsets of the data. Each iteration starts with randomly selecting quartets of haplotypes composed of two control haplotypes and two unrelated case haplotypes. The program *beagle* [8] is used to infer the minimum number and location of recombinations necessary for each quartet to conform to an infinite sites model of evolution. *Beagle* scans the haplotypes for pairs of markers at which all four possible gametes are present. Without invoking recurrent mutation, this pattern indicates the presence of a recombination event somewhere between such marker pairs. *Beagle* assigns a minimum set of locations of recombination events that satisfies all such pairs. These are used to divide the haplotypes in the quartet into a set of genomic segments each consistent with a single tree for an evolutionary history.

Next, each of these segments is passed to the *pars* program in the package PHYLIP [9] to generate a single most parsimonious tree reflecting the history of that segment. This is the inferred evolutionary tree that minimizes the number of mutation events to describe all of the differences between the sequences. Of the three possible topologies for an unrooted tree with four taxa and no ambiguities (Fig. 1), only one will contain a branch that separates the two cases from the two controls. Segments generating a tree with this topology are considered "consistent". Either of the other two topologies is considered "inconsistent". Segments that do not unambiguously generate a maximum parsimony tree that is either consistent or inconsistent are thrown out.

After each segment is analyzed, the frequency across quartets in which each site finds itself in a consistent segment is calculated. This frequency, the consistency of the sample, is used as the test statistic. Elevated levels of consistency correspond to sites better able to explain the differences between case and control phenotypes. The amount of information learned from the data increases with each iteration but does so with asymptotically diminishing returns once every possible quartet has been picked.

### Assessing significance

The significance of results is addressed through classical hypothesis testing. The data are modeled as being drawn at random from two populations, one of case haplotypes and one of control haplotypes. Every mutation in the history of the sample distinguishes between two alleles and creates two types of haplotypes, one with the ancestral allele at that site and one with the derived allele. If a particular mutation contributes to the probability of disease, then haplotypes containing that allele will be over-represented in the case population. The null hypothesis is that the two types are represented with equal frequency in both the case and control populations.

A given data set can be summarized as having *K*_{1} case haplotypes with *a*_{1} copies of one type of haplotype and (*K*_{1} - *a*_{1}) of the other, and *K*_{2} control haplotypes with *a*_{2} and (*K*_{2} - *a*_{2}) copies of the two types of haplotypes respectively. As all the trees are unrooted, it is not necessary to identify which type is ancestral and which is derived. Therefore, which type is labeled (*a*) is an arbitrary designation.

The probability of a randomly drawn quartet being counted as consistent, *p*(*a*_{1}, *a*_{2}, *K*_{1}, *K*_{2}), can be calculated as

\begin{array}{l}\frac{{a}_{1}^{2}-{a}_{1}}{{K}_{1}^{2}-{K}_{1}}\frac{({K}_{2}-{a}_{2})({K}_{2}-{a}_{2}-1)}{{K}_{2}^{2}-{K}_{2}}+\frac{{a}_{2}^{2}-{a}_{2}}{{K}_{2}^{2}-{K}_{2}}\frac{({K}_{1}-{a}_{1})({K}_{1}-{a}_{1}-1)}{{K}_{1}^{2}-{K}_{1}}\hfill \\ +\frac{1}{3}\left[\frac{{a}_{1}^{2}-{a}_{1}}{{K}_{1}^{2}-{K}_{1}}\left(1-\frac{({K}_{2}-{a}_{2})({K}_{2}-{a}_{2}-1)}{{K}_{2}^{2}-{K}_{2}}\right)+\frac{({K}_{1}-{a}_{1})({K}_{1}-{a}_{1}-1)}{{K}_{1}^{2}-{K}_{1}}\left(1-\frac{{a}_{2}^{2}-{a}_{2}}{{K}_{2}^{2}-{K}_{2}}\right)\right]\hfill \\ +\frac{1}{3}\left[\frac{({K}_{2}-{a}_{2})({K}_{2}-{a}_{2}-1)}{{K}_{2}^{2}-{K}_{2}}\frac{2({K}_{1}{a}_{1}-{a}_{1}^{2})}{{K}_{1}^{2}-{K}_{1}}+\frac{({K}_{1}-{a}_{1})({K}_{1}-{a}_{1}-1)}{{K}_{1}^{2}-{K}_{1}}\frac{2({K}_{2}{a}_{2}-{a}_{2}^{2})}{{K}_{2}^{2}-{K}_{2}}\right].\hfill \end{array}

This is the probability of selecting a quartet composed of two haplotypes of one type from the control set and two haplotypes of the other type from the case set (which are the quartets that contribute to the signal in the data) plus 1/3 times the probability of selecting a quartet that contains three or four haplotypes of the same type and one or zero of the other (these quartets are uninformative and contribute only random noise).

The probability of a site being counted as consistent *x* times after *N* quartets have been analyzed should be calculated as

{\displaystyle \underset{0}{\overset{1}{\int}}{\displaystyle \sum _{{a}_{1}=0}^{{K}_{1}}{\displaystyle \sum _{{a}_{2}=0}^{{K}_{2}}\left(f(\alpha )g\left({a}_{1},{K}_{1},\alpha \right)g\left({a}_{2},{K}_{2},\alpha \right)\left(\begin{array}{c}N\\ x\end{array}\right){(p({a}_{1},{a}_{2},{K}_{1},{K}_{2}))}^{x}{(1-p({a}_{1},{a}_{2},{K}_{1},{K}_{2}))}^{N-x}\right)}}}d\alpha .

This is essentially a binomial sampling probability with *p*(*a*_{1}, *a*_{2}, *K*_{1}, *K*_{2}) being the chance of success on a particular trial, summed over all the possible values of the random variables *a*_{1} and *a*_{2} times their probability densities, *g*(*a*_{1}, *K*_{1}, *α*) and *g*(*a*_{2}, *K*_{2}, *α*), which are the binomial sampling probabilities of drawing *a* of the arbitrarily labeled haplotypes in *K* samples from a total population where the labeled type of haplotype is at frequency *α*.

Because the probability densities of *a*_{1} and *a*_{2} depend on the random variable *α*, the whole statement has to be integrated over all possible values of *α* times its probability density, *f*(*α*). Unfortunately, *f*(*α*) is unknown, so exact marginal probabilities cannot be calculated directly. However, for values of consistency above 1/3, which are the only ones of interest, the likelihood of the null hypothesis is at a maximum when *α* = 0.5, so assuming *α* = 0.5 and calculating *p*-values by summing the probabilities of finding the observed or higher number of consistent quartets is conservative.