Genotype data and sample
We used all 100 replicates from chromosome 6 sparse SNP data set. We first performed the transmission/disequilibrium test (TDT) [8] and Hardy-Weinberg equilibrium test for family data sets. We did not include the markers with minor allele frequencies < 0.01. We selected unrelated individual samples including one sib from each ASP family (1500 individuals) and 500 controls.
LD blocks
We considered SNP markers with LD, D' > 0.7. We selected eight LD blocks, where LD Blocks 1 to 4 are known to be not associated with the RA and Blocks 5 to 8 are known to be associated with the RA. Each LD block contained two to six markers.
Haplotype-based association test
For the selected LD blocks, their haplotypes and frequencies were estimated by the expectation-maximization (EM) algorithm. We then performed the haplotype-association tests by fitting logistic regression. In this association study, we pooled the minor haplotypes that have frequencies less than 0.05. The effect of haplotype can be assumed to be additive, dominant, or recessive. In our analysis, we assumed the additive effect of haplotypes and performed the test using haplo.glm [5].
PC score association test
We first determined whether the effect of a SNP in LD blocks is additive, dominant, or recessive. If the effect of the SNP is additive, the SNP is coded as 0, 1, and 2 according to the number of minor alleles. On the other hand, for the dominant or recessive effect, it is coded as 0 or 1. Then, we performed the PC analysis with LD blocks and calculated the PC scores. For the given LD block, suppose there are k SNPs denoted by s1, s2,..., s
k
, where s
k
is coded as 0, 1, or 2. Then, the PC scores are defined as follows:
PC
i
= e
i
'S,
where PC
i
is the ith PC score, e
i
is its eigenvector, and S = [s1, s2,..., s
k
] is the score vector of SNPs. In our analysis, we only assumed the additive effect of SNPs. We determined the number of PC scores in each block to account for 70% of total variation, which ranged from one to three. For these PC scores, we fitted logistic regression with PC score as covariates.
Comparison of PC score and haplotype-based association tests
The association tests were performed using logistic regression with PC scores and haplotypes as covariates. Using Akaike Information Criterion (AIC), power, and type I error, we evaluated the performances of the PC test and the haplotype-based association test.