Significant haplotype association with rheumatoid arthritis
The localized haplotype cluster analysis detected an association that was significant after adjustment for multiple testing. The p-value for this cluster was 6.13 × 10-6 before adjustment for multiple-testing, and 0.012 after adjusting for multiple testing (10,000 permutations), including adjustment for single-marker tests as well as all the localized haplotype cluster tests [3]. None of the single-marker test p-values were less than 0.2 after adjusting for multiple testing.
The significant haplotype cluster consisted of haplotypes having the sequence 2,1 at SNPs 1631 (rs2195534) and 1632 (rs1791320). These SNPs are located less than 500 base pairs upstream of CCBE1 (collagen and calcium binding EGF domains 1) on chromosome 18q21.32 (NCBI build 36.2, dbSNP build 126). SNPs 1515 to 1630 are located within this gene. Haplotypes in the sample with this 2,1 sequence at SNPs 1631 and 1632 also all share the sequence 1,1,2,1,1 at SNPs 1626 to 1630. A total of 32 of the individuals included in the analysis have this haplotype, of whom 29 are controls, so the haplotype is associated with reduced risk of rheumatoid arthritis. There was also one carrier (a case individual) among the individuals excluded from the analysis on grounds of ethnicity.
Figures 1 and 2 from Haploview show the linkage disequilibrium structure around the significant haplotype. SNPs 1631 and 1632 form a haplotype block that is in fairly strong LD with a block comprising SNPs 1621–1630 (Fig. 2).
Eight of the 32 carriers included in the analysis have some Eastern European ancestry, whereas only 72 of the 845 individuals included in the analysis have Eastern European ancestry, so the haplotype is enriched in this ethnic group. All other ethnic groups had similar frequencies in the overall sample and in the carriers. All 8 individuals with Eastern European ancestry carrying the haplotype were controls. The 72 individuals with Eastern European ancestry included 37 cases and 35 controls. If the individuals with Eastern European ancestry are removed from the analysis, the haplotype is no longer significant after adjustment for multiple testing.
Comparison with other analysis methods
Single-marker allelic tests did not detect significant associations after correction for multiple testing. The unadjusted p-values for single marker tests of SNPs 1631 and 1632 were 1.0 and 0.2, respectively. The smallest unadjusted p-value in the region consisting of SNPs 1620–1632 was 0.01 for SNP 1621, however the multiple-testing adjusted permutation p-values for all markers in this region were 1.0.
We ran a multilocus score test [9] with SNPs 1631 and 1632 on the same individuals used in the localized haplotype clustering tests. For this test, the genotypes at the SNPs are coded as 0, 1, or 2 depending on the number of copies of one of the alleles. Individuals with a missing genotype at one of the two SNPs were removed from the analysis (four individuals). The test statistic was 15.0, yielding a p-value of 5.6 × 10-4 (tail of chi-square distribution with 2 degrees of freedom). We did not attempt to adjust for multiple testing because it is unclear what set of tests should be considered, but we note that the score test p-value is almost 100 times higher than that obtained using localized haplotype clustering (6.1 × 10-6) and thus would probably not survive correction for multiple testing of all relevant marker sets in the data.
We used Haploview [10] to test for association between haplotypes defined by blocks (default settings) and case-control status, using the raw genotype data, with the same individuals used as in the other analyses. In total, 2300 single-marker tests and 1135 haplotype tests (from 264 blocks) were performed. The minimum unadjusted p-value was 1.7 × 10-5 obtained for haplotype "21" in the block made up of SNPs 1631 and 1632. In 1000 permutations, 24 obtained minimum p-values lower than the original minimum p-value, so that the multiple-testing adjusted p-value is 0.024, which is higher than that found using the localized haplotype cluster method. Although Haploview did not require data to be phased before input, it did take longer to do the permutation testing, with 1000 permutations running overnight, compared to 4 minutes for the 10,000 permutations of the haplotype cluster test with Beagle. Whereas Haploview takes a full likelihood-based EM approach to haplotype testing, Beagle uses inferred haplotypes, and constructs haplotype clusters without regard to case-control status, so that permutation-testing does not need to redefine the clusters or re-estimate the haplotypes and is very fast (for details, see Browning and Browning [3]).