### Comparing GAW17 genotype calls to HapMap III

Of the 2,095,632 genotype calls in the GAW17 data set for which HapMap comparison can be done, a mere 1,371,479, or 65.4%, of genotype calls are HapMap concordant. The most concordant individual is 73.63% concordant; the lowest is 51.79% concordant. To put this in perspective, consider that the average HapMap minor allele frequency of these 3,402 SNPs is 20.96%. Assuming Hardy-Weinberg equilibrium, one can attain 62.47% HapMap concordance by simply guessing that every genotype is major-allele homozygous. This issue clearly warrants further inquiry.

### Generating quality scores for genotype calls

A total of 279,491 genotypes are called on chromosome 1 for CEU individuals in HapMap for which the SNP is also in HapMap. Looking only at the 44,650 calls made with 99% confidence or greater, I find 39,506/44,650 = 88.48% overall HapMap concordance. This immediately points to the quality scores being inaccurate. Assuming that each genotype quality score is independent and truly called with 99%+ accuracy, we can calculate the probability of seeing 88.48% or less concordance on 44,650 genotypes using the binomial distribution:

*P*(≤ 39,506 successes in 44,650 trials | *P* of success ≥ 99%) < 2.23 × 10^{–308}.(1)

By examining the concordance rate for each sample, I find that, using the binomial distribution, 30 of 84 samples have concordance rates that do not correspond to their quality scores (*α* = 10^{–4}). For example, individual NA12748 has 54.66% concordance on 2,534 calls, NA12842 has 46.15% concordance on 1,703 calls, and NA12889 has 42.10% concordance on 1,240 calls, with all calls having a quality score of 99%+. If the GAW17 data set were filtered only on these inaccurate quality scores, then the resulting HapMap discordance would seem inevitable.