- Open Access
Case-control association analysis of rheumatoid arthritis with candidate genes using related cases
© Yoo et al; licensee BioMed Central Ltd. 2007
- Published: 18 December 2007
We performed a case-control association analysis of rheumatoid arthritis (RA) for several candidate genes using the North American Rheumatoid Arthritis Consortium (NARAC) data provided in Genetic Analysis Workshop 15. We conducted the case-control association analysis using all related cases and unrelated controls and compared the results with those from the analysis of samples using only one randomly selected case from each family and all unrelated controls. For both analyses we used a weighted composite likelihood ratio test based on single-nucleotide polymorphism (SNP) markers or haplotypes accounting for the correlation among samples within a family. Several SNPs, including R620W in the candidate gene PTPN22, showed an association with RA status, which confirmed previously reported results. Several other SNPs in the candidate genes, such as CTLA4, HAVCR1, and SUMO4, also had rather small p-values (<0.05), suggesting the associations between them and RA. Our results showed that the p-values obtained from the analysis including all related cases were generally smaller than those obtained from the analysis including only one randomly selected case per family. These results, together with the results, based on simulated data, showed that higher power could be achieved using all related cases.
- Rheumatoid Arthritis
- Candidate Gene
- Haplotype Analysis
- Stepwise Logistic Regression
- Related Case
Case-control studies using unrelated case and control samples provide a powerful and efficient way to assess the association between candidate genes and diseases. However, case-control studies with related case samples are not uncommon, especially for follow-up studies from a linkage study using pedigrees and/or affected sib pairs. The association analysis based on related cases can be more effective than that based on unrelated cases randomly selected from the population  because having more affected members in families will increase the difference in risk allele frequencies between cases and controls . When related cases are available, a simple method for conducting a case-control analysis is to select one case from each family randomly. Although this method is analytically simple, it results in lower power compared with methods that include all related case samples. On the other hand, the use of all related cases without appropriately accounting for their correlations will result in inflated type I error rates. Therefore, many methods were developed to account for the correlation among related cases [1–4]. Browning et al.  proposed a weighted composite likelihood ratio test in which the weights of samples are calculated according to their kinship coefficients. One advantage of this method is that it is applicable to both single markers and haplotypes. In this study, we performed an association analysis of rheumatoid arthritis (RA) status with two candidate gene data sets of related cases and unrelated controls from the North American Rheumatoid Arthritis Consortium (NARAC) data provided in Genetic Analysis Workshop 15. We applied the method developed by Browning et al.  to these data sets and compared the p-values with those obtained from the analysis including only one randomly selected case from each family. We also assessed how the exclusion of candidate variants can affect the analysis and performed the analysis based on multiple markers and haplotypes to identify the effect of multiple single-nucleotide polymorphisms (SNPs).
Weighted composite likelihood ratio test using kinship coefficients
Here, we briefly introduce the weighted composite likelihood ratio (WCLR) test developed by Browning et al. . Denote the individual i's genotype at a single marker as for i = 1, 2,...,n, where and are one of the alleles A1,...,A l with the corresponding frequency . The weight of each individual is denoted as w i . A weighted composite likelihood of a single marker is then , where p = (). Based on this composite likelihood, the allele frequency can be estimated for only cases (), only controls (), and all cases and controls (). A likelihood ratio test can be constructed as and it asymptotically follows a chi-square distribution with l - 1 degrees of freedom. Similarly, the likelihood ratio test based on haplotypes and a given set of weights can also be constructed. In general, haplotypes for samples are unknown. In this situation, the haplotype frequencies can be estimated via the expectation-maximization (EM) algorithm and incorporated into the test.
The weight w i of each individual is calculated based on their kinship coefficients. Specifically, w = (w1,...,w n ) satisfies 2(w1,...,w n )K = (1,...,1), where K is the n × n kinship matrix. The sum of weights, w1 + ... + w n , can be considered as the effective sample size, which is greater than the number of families but less than the total number of samples. Thus, we expect the power of this method to be greater than that of the method using one randomly selected case per family. Finally, it is worth noting that w i = 1 for unrelated individuals. In this situation, the weighted composite likelihood becomes the ordinary likelihood.
Candidate gene analysis
We analyzed two candidate gene data sets from NARAC using CCREL software that implements the WCLR test of Browning et al. . The first data set contains genotypes of 1256 cases from 665 nuclear families and 1519 unrelated controls at 14 SNP markers in the candidate gene PTPN22 . The 14 SNPs are labeled as SNP1 to SNP14 corresponding to their positional order along chromosome 1. One family was excluded from the analysis because of genotyping errors. Most of the parents' genotypes were not available. The second data set contains genotypes of 816 cases from 461 nuclear families and 855 unrelated controls at 20 SNP markers in 14 candidate genes, including PTPN22, CTLA4, TNFRSF1B, PADI4, HAVCR1, IBD5, SLC22A4, IL3, IL4, SUMO4, ILG5, CARD15, RUNX1, and MFL . We performed the case-control association analysis using two methods. In the first method we used all cases and controls and applied the WCLR test. In the second method, we randomly selected one case from each family and included all unrelated controls and applied the allelic chi-square test. For the first data set, we performed the single marker analysis, the multiple marker analysis (the stepwise logistic regression), and the haplotype analysis on two or three markers. We also obtained the linkage disequilibrium (LD) measure (r2) between SNPs from this data set. For the second data set, we performed the single-marker analysis because this data set only includes a few SNPs for each candidate gene.
The power of the two methods based on all related cases and all controls, the weighted composite likelihood ratio test  and the method suggested by Slager and Schaid , and the power of the method using only one randomly selected case per family and all controls were evaluated and compared based on simulated data. One thousand data sets were generated and each of them consisted of 200 affected sib pairs (400 cases) and 200 unrelated controls. Only genotypes at the disease locus were simulated and analyzed. The minor allele frequencies at the disease locus for cases and controls were set as 0.141 and 0.095 to calculate the power, and set as 0.10 and 0.10 to calculate the type I error rate. The significance level was set as 0.05.
Association of RA status with PTPN22
Single-marker analysis for SNPs in PTPN22
Minor Allele Frequencies
Affected Siblingb (n = 1256)
One Sibling per Familyc (n = 664)
(n = 1519)
One Sibling per Familyc
7 × 10-5
5 × 10-5
7 × 10-5
4.32 × 10-13
8.53 × 10-13
Using the data of one randomly selected case per family, we performed a stepwise logistic regression to see whether the association with SNP9 in PTPN22 accounted for the associations were observed with other SNPs. Six SNPs, SNP2, SNP4, SNP8, SNP9, SNP11, and SNP14 were included in the final model according to the Akaike Information Criterion (AIC). Among them, SNP2, SNP4, SNP8, and SNP11 did not show the associations with RA status in the single-marker analysis, suggesting the existence of the epistatic effect between them. SNP3, SNP5, SNP7, and SNP13 were in moderate LD with SNP9 and were not included in the final model, indicating that the associations observed with these SNPs were accounted for by SNP9.
Haplotype analysis for SNPs in PTPN22
SNPs selected in the stepwise regression (SNP2, SNP4, SNP8, SNP11, SNP14)
SNPs in LD with SNP9 (SNP3, SNP5, SNP13)
2 & 11
3 & 5
2 & 4 & 11
3 & 13
2 & 8 & 11
5 & 13
2 & 11 & 14
3 & 5 & 13
Association of RA status with other candidate genes
Single-SNP case-control analysis for SNPs in 14 candidate genes
Minor Allele Frequencies
Affected Siblingb (n = 816)
One Sibling per Familyc (n = 461)
(n = 855)
One Sibling per Familyc
2.13 × 10-10
4.96 × 10-8
Our simulation results showed that the weighted composite likelihood ratio test  had a power of 89.4%, which was very similar to the power of the method developed by Slager and Schaid  (89.2%). Both of them had higher power than the method using one randomly selected case per affected sib pair and all unrelated controls (79.0%). All three methods showed appropriate type I error rates close to the nominal level of 0.05 (0.047, 0.048, and 0.053, respectively).
Carlton et al.  conducted the case-control association analysis using a subset of the first data set of 14 PTPN22 SNPs by randomly selecting one affected case from each family. Their reported p-values based on single markers are slightly different from our results for SNP10, SNP11, and SNP12, which might be due to the random selection of the set of case samples. Most of SNPs in the second data set were analyzed by Plenge et al. , and our results overall are consistent with their findings. Our results, obtained by randomly selecting one case from each family, varied from one selection to another. In practice, there are no guidelines for choosing the best result because the results generally reflect the selection process.
The results from the stepwise logistic regression analysis showed that the effect of several SNPs disappeared when SNP9 was included in the analysis. These SNP were in moderate LD with SNP9 and showed the associations with RA status in the single marker analysis, indicating their associations with RA status may be mainly due to the association of SNP9 with RA status. Several SNPs that were included in the final regression model did not show the associations in the single marker analysis but showed the associations in the haplotype analysis (e.g., SNP2 and SNP11). SNP2 and SNP1 (strong LD with SNP2, r2 = 0.997) have been suggested as having the associations with RA status by Carlton et al. .
We also performed haplotype analysis based on two or three SNPs after excluding SNP9 from the PTPN22 SNP data set. We wanted to see whether the indirect association between RA and these SNPs could be better captured using haplotype analysis. The two SNPs adjacent to SNP9, SNP8 and SNP10, did not show an association with RA in the haplotype analysis; SNP8 showed no association with RA, and SNP10 showed only marginal association with RA in the single-marker analysis. However, haplotypes based on SNPs in moderate LD with SNP9 showed associations with RA status. Thus, the sliding-window approach may not be sufficient to capture the associations for untyped SNPs. In this situation, one can analyze haplotypes based on all combinations of several SNPs but with rapidly increased number of tests. One can also only analyze haplotypes in high LD with untyped SNPs with reduced number of tests, if the LD patterns can be obtained from publicly available data resources (e.g., HapMap project).
We analyzed the association between several candidate genes and RA status using all related cases and all unrelated controls. Our analysis showed that several SNPs in the candidate gene PTPN22 were significantly associated with RA status with possible epistatic effects. Also, SNPs in CTLA4, HAVCR1, and SUMO4 were significantly associated with RA status.
We compared the results of the analysis including all related cases with those of the analysis including only one randomly selected case per family. The p-values from the analysis including all cases were generally smaller than those from the analysis including only one randomly selected case per family, suggesting the higher power of the method using all cases. This was confirmed by the power comparison based on simulated data. Thus, we suggest the use of methods that can use all related cases and can correctly account for the correlations among them, such as the weighted composite likelihood method of Browning et al. , rather than the methods that use one randomly selected case per family, in order to yield more power and eliminate inconsistency.
This work is supported by grant R01 GM074913 from the National Institute of Health.
This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.
- Slager SL, Schaid DJ: Evaluation of candidate genes in case-control studies: a statistical method to account for related subjects. Am J Hum Genet. 2001, 68: 1457-1462. 10.1086/320608.View ArticlePubMed CentralPubMedGoogle Scholar
- Risch N, Teng J: The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human disease. I. DNA pooling. Genome Res. 1998, 8: 1273-1288.PubMedGoogle Scholar
- Browning SR, Briley JD, Chandra G, Charnecki JH, Ehm MG, Johansson KA, Jones BJ, Karter AJ, Yarnall DP, Wagner MJ: Case-control single-marker and haplotypic association analysis of pedigree data. Genet Epidemiol. 2005, 28: 110-122. 10.1002/gepi.20051.View ArticlePubMedGoogle Scholar
- Slager SL, Schaid DJ, Wang L, Thibodeau SN: Candidate-gene association studies with pedigree data: controlling for environmental covariates. Genet Epidemiol. 2003, 24: 273-283. 10.1002/gepi.10228.View ArticlePubMedGoogle Scholar
- Carlton VEH, Hu X, Chokkalingam AP, Schrodi SJ, Brandon R, Alexander H, Chang M, Catanese JJ, Leong DU, Ardlie KG, Kastner DL, Seldin MF, Criswell LA, Gregersen PK, Beasley E, Thomson G, Amos C, Begovich AB: PTPN22 genetic variation: evidence for multiple variants associated with RA. Am J Hum Genet. 2005, 77: 567-581. 10.1086/468189.View ArticlePubMed CentralPubMedGoogle Scholar
- Plenge RM, Padyukov L, Remmers EF, Purcell S, Lee AT, Karlson EW, Wolfe F, Kastner DL, Alfredsson L, Altshuler D, Gregersen PK, Klareskog L, Rioux JD: Replication of putative candidate-gene associations with RA in >4,000 samples from North America and Sweden: association of susceptibility with PTPN22, CTLA4, and PAD14. Am J Hum Genet. 2005, 77: 1044-1060. 10.1086/498651.View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.