Weighted composite likelihood ratio test using kinship coefficients
Here, we briefly introduce the weighted composite likelihood ratio (WCLR) test developed by Browning et al. . Denote the individual i's genotype at a single marker as for i = 1, 2,...,n, where and are one of the alleles A1,...,A
with the corresponding frequency . The weight of each individual is denoted as w
. A weighted composite likelihood of a single marker is then , where p = (). Based on this composite likelihood, the allele frequency can be estimated for only cases (), only controls (), and all cases and controls (). A likelihood ratio test can be constructed as and it asymptotically follows a chi-square distribution with l - 1 degrees of freedom. Similarly, the likelihood ratio test based on haplotypes and a given set of weights can also be constructed. In general, haplotypes for samples are unknown. In this situation, the haplotype frequencies can be estimated via the expectation-maximization (EM) algorithm and incorporated into the test.
The weight w
of each individual is calculated based on their kinship coefficients. Specifically, w = (w1,...,w
) satisfies 2(w1,...,w
)K = (1,...,1), where K is the n × n kinship matrix. The sum of weights, w1 + ... + w
, can be considered as the effective sample size, which is greater than the number of families but less than the total number of samples. Thus, we expect the power of this method to be greater than that of the method using one randomly selected case per family. Finally, it is worth noting that w
= 1 for unrelated individuals. In this situation, the weighted composite likelihood becomes the ordinary likelihood.
Candidate gene analysis
We analyzed two candidate gene data sets from NARAC using CCREL software that implements the WCLR test of Browning et al. . The first data set contains genotypes of 1256 cases from 665 nuclear families and 1519 unrelated controls at 14 SNP markers in the candidate gene PTPN22 . The 14 SNPs are labeled as SNP1 to SNP14 corresponding to their positional order along chromosome 1. One family was excluded from the analysis because of genotyping errors. Most of the parents' genotypes were not available. The second data set contains genotypes of 816 cases from 461 nuclear families and 855 unrelated controls at 20 SNP markers in 14 candidate genes, including PTPN22, CTLA4, TNFRSF1B, PADI4, HAVCR1, IBD5, SLC22A4, IL3, IL4, SUMO4, ILG5, CARD15, RUNX1, and MFL . We performed the case-control association analysis using two methods. In the first method we used all cases and controls and applied the WCLR test. In the second method, we randomly selected one case from each family and included all unrelated controls and applied the allelic chi-square test. For the first data set, we performed the single marker analysis, the multiple marker analysis (the stepwise logistic regression), and the haplotype analysis on two or three markers. We also obtained the linkage disequilibrium (LD) measure (r2) between SNPs from this data set. For the second data set, we performed the single-marker analysis because this data set only includes a few SNPs for each candidate gene.
The power of the two methods based on all related cases and all controls, the weighted composite likelihood ratio test  and the method suggested by Slager and Schaid , and the power of the method using only one randomly selected case per family and all controls were evaluated and compared based on simulated data. One thousand data sets were generated and each of them consisted of 200 affected sib pairs (400 cases) and 200 unrelated controls. Only genotypes at the disease locus were simulated and analyzed. The minor allele frequencies at the disease locus for cases and controls were set as 0.141 and 0.095 to calculate the power, and set as 0.10 and 0.10 to calculate the type I error rate. The significance level was set as 0.05.