Candidate SNP selection
Details of results for linkage, association, and conditional logistic regression analyses are described by Charoen et al. [5]. Analysis of Replicates 1–5 revealed strong evidence of linkage on chromosome 6, with many SNPs in the linked region strongly associated with the disease status, including non-dense SNPs 152–155 and 162, and many of the dense SNPs. Application of forward and backward stepwise conditional logistic regression to non-dense SNPs in Replicates 1–5 always resulted in a model containing SNPs 153, 154, and 162. In most replicates, one or two other SNPs were needed to model the association with SNPs in this region. The additional SNPs significantly associated with the disease (after accounting for SNPs 153, 154, and 162) varied from replicate to replicate. Similar analysis of dense SNPs suggested that association with SNPs d3437 and d3439 could account for most of the association observed with the remaining dense SNPs in the DR/C locus region. In addition, Charoen et al. [5] found that association with at least one SNP in the vicinity of locus D (either d3931 or d3933) remained significant in all of Replicates 1–5, after accounting for the two SNPs in the DR/C locus region. These results were used to select the SNPs, haplotypes, and sets of SNPs shown in the first column of Table 1 for the analyses presented here.
Non-dense SNP analysis
Results of our analyses of all 100 replicates are summarized in Table 1. Although stepwise conditional logistic regression analysis had suggested that none of the SNPs are the sole causal variants in the HLA region [5], for SNPs 152–155 and 162 we used LAMP, Li-cpg, and Sun-cpg to test whether any one of these SNPs alone could fully explain the observed linkage signal. The aim of the analysis of all replicates was estimation of the power to reject each of these SNPs as the sole causal variant. This power depends on the underlying model parameters, including parameters that specify the effects of the causal loci, allele frequencies, and LD between the causal loci and the candidate SNPs, as well as on sample size. Using LAMP, Li-cpg and Sun-cpg, the power for SNPs 152, 154, 155, and 162 was 100%. The power for SNP 153 was 90% with Sun-cpg, and 93% with LAMP and Li-cpg.
Overall, therefore, all methods that we applied had high power to reject single SNPs as the sole causal polymorphisms on chromosome 6. We then considered several combinations of SNPs and tested whether association with a particular set of SNPs could explain the observed linkage. This question was addressed using our extension of the Sun-cpg method. For the set of SNPs {153,154} the Sun-cpg method had 73% power to reject these as the sole causal polymorphisms in the region. For the combination of SNPs 153 and 162 the power was 100%, while for SNPs 153, 154, and 162 the power was 51%.
Dense SNP analysis
For the dense SNP sets, the Sun-cpg approach had no power (4% power) to reject the sets of dense SNPs {d3437, d3439, d3931} or {d3437, d3439, d3933} as being either causal or in complete LD with the sole causal variant(s) in the region. The likely reason for this low power was that the d3437–d3439 haplotype is in very high LD with the DRB1-C haplotype, while SNPs d3931 and d3933 are in high LD with the D locus. Thus, it is not surprising that the studied sets of dense SNPs were able to capture the association of disease with the DRB1, C, and D loci very well.
The haplotype extension of Sun-cpg had only 8% power to reject the haplotype composed of SNPs d3437 and d3439 as the sole cause of the observed linkage. Using the haplotype extension of Li-cpg, the power was 9%. The same results were obtained for the haplotype of dense SNP d3437 with non-dense SNP 153. Our analysis had little power to reject the haplotype composed of dense SNPs d3437 and d3439 alone as being either causal or in complete LD with the sole causal variant(s) in the region, although association with this haplotype does not in fact account for the effect at locus D. This is because most of the observed strong linkage in this region is due to association with the DRB1 and C loci. Hence association with the SNP d3437–d3439 haplotype, which is in high LD with the DRB1-C haplotype, can almost fully account for the observed linkage at the DRB1 locus. We also note that the haplotype methods that we had used were extensions of our Li-cpg and Sun-cpg methods, which generally have lower power than the Li and Sun methods, because of the additional conditioning on haplotypes. Extensions of the Sun and Li methods to haplotypes would be of interest, because they are expected to be more powerful than our haplotype methods. However, there are difficulties with extending those methods to haplotypes. For instance, for a haplotype extension of the Sun method, haplotype frequencies would need to be pre-specified, and those usually cannot be estimated accurately.
DRB1 analysis
Using the Sun-cpg method for multi-allelic candidate markers [3], the power to reject the DRB1 locus as the sole causal site in the chromosome 6 region was 32%. Analysis with the program LAMP led to rejection of each of the DRB1 alleles as the sole causal allele in the region with 100% power. Analysis with the Li-cpg approach led to rejection of the DRB1 locus as the sole causal site with 99% power. However, as we explain in the Discussion, these results should be interpreted cautiously.