- Open Access
Genome-wide linkage and association analysis of rheumatoid arthritis in a Canadian population
© Wei and Li; licensee BioMed Central Ltd. 2007
- Published: 18 December 2007
Rheumatoid arthritis (RA) is an autoimmune disease with a moderately strong genetic component. Previous linkage and candidate gene studies have identified several regions that predispose to RA, including the HLA-DRB1 and PTPN22. We conducted genome-wide linkage analysis with 128 affected individuals from 60 families in a Canadian cohort that were genotyped using the Illumina linkage panel and genome-wide association analysis with 158 affected individuals from the same cohort that were genotyped using the Affymetrix 100 K platform. Multipoint nonparametric linkage scan revealed three linkage peaks with LOD scores greater than 1.5. We also identified 13 significantly associated SNPs at the genome-wide level of 0.05 after Bonferroni adjustment for multiple testing. Several of the significantly associated SNPs are located close to previously identified linkage regions, but not in the linkage peaks identified in the same cohort. We could not replicate association with HLA-DRB1 and PTPN22. Our results indicate that high coverage and sufficient sample size are crucial for the success of genome-wide association studies.
- Rheumatoid Arthritis
- Association Analysis
- Hairy Cell Leukemia
- Linkage Peak
- North American Rheumatoid Arthritis Consortium
Rheumatoid arthritis (RA) is a complex autoimmune genetic disorder in which the immune system attacks normal tissues as if they were invading pathogens. Twin and family studies have suggested that the heritability of RA is ~60%. A well established RA susceptibility locus is the HLA region located on chromosome 6p, which is estimated to account for one-third of the genetic component of RA etiology. Apart from the HLA region, a number of other chromosomal regions have been replicated among various genome-wide linkage scans in which the leading regions include chromosome 1p13, 1q41–43, 6q16, 16p, and 18q .
Linkage analysis has low power to detect genetic variants that confer modest disease risks. For complex diseases such as RA, tests of genetic association with the disease may be more effective. Genetic association analyses have led to the identification of PTPN22 , a gene that has been replicated in many subsequent studies. Additional susceptibility loci for RA that have been implicated by association analyses include PADI4, SLC22A4, RUNX1, and CTLA4.
In this investigation, we performed genome-wide linkage and association analyses of the Canadian Rheumatoid Arthritis Genetic Study (CRAGS) data made available to Genetic Analysis Workshop 15 participants. We seek to identify genetic variants that predispose to RA and to characterize their genetic contributions.
Data sets and initial data quality checking
The CRAGS provided two data sets. The first data set includes 60 families (128 affected individuals) that were genotyped using the Illumina linkage panel on 5429 SNPs across 22 autosomal chromosomes. The second data set includes 158 affected individuals (78 affected sib pairs (ASPs) and one affected avuncular pair) that were genotyped using the Affymetrix 100 K platform on 113,237 SNPs across 22 autosomal chromosomes. Among the 113,237 SNPs, a total of 87,181 SNPs had >85% genotypes completed, and exhibited a minor allele frequency (MAF) of >0.05. The 87,181 SNPs that passed the initial quality control had an average MAF of 0.247 and genotyping success rate of 96.8%.
Test of Hardy-Weinberg equilibrium in the presence of disease association
Assessing Hardy-Weinberg equilibrium (HWE) is often an important step for checking the quality of genotype data. The standard test of HWE assumes that the genotypes are randomly sampled from the general population. However, in the CRAGS, all individuals are affected. As a result, when a marker is associated with the disease, the corresponding genotypes may no longer be a random sample. Assessing departure from HWE in the presence of disease association is particularly important for genome-wide association studies in which the disease variants are either directly genotyped or are in linkage disequilibrium (LD) with the genotyped markers. Analysis using the standard HWE test might result in many rejections, and perhaps, some of the rejected markers are in LD with the disease variants. Here we develop a likelihood framework that allows the assessment of departure from HWE while taking into account potential association with the disease.
Joint genotype probability for a sib pair (genotypes are unordered)
Allow departure from HWE
pq2(1 + q)
pq(1 + pq)
p2q(1 + p)
Linkage and association analysis
We performed genome-wide, nonparametric multipoint linkage analysis using the SPAIR statistic  as implemented in MERLIN  on the 60 families that were genotyped using the Illumina linkage panel. The SPAIR statistic combines information from pairs of affected individuals and can detect regions of excess IBD sharing.
We performed single-marker association analysis using LAMP [5, 6], which uses a maximum-likelihood model to extract information on genetic association from samples of unrelated individuals, sibships, and larger pedigrees. Briefly, the program estimates the disease-SNP haplotype frequencies and three penetrances using all available data by maximizing the likelihood of the marker data conditional on the disease phenotypes. A likelihood ratio test with approximately two degrees of freedom is constructed by comparing the likelihood maximized under the alternative model, which allows for LD between the disease and SNP loci, with the likelihood maximized under the null model that assumes linkage equilibrium. We assumed a fixed disease prevalence of 0.8%. Different disease prevalence changed parameter estimates slightly, but did not appear to affect the overall ranking of SNPs.
Significantly associated SNPs after Bonferroni correction with αgenome = 0.05 using LAMP
2.39 × 10-7
1.35 × 10-8
2.51 × 10-8
2.82 × 10-8
3.24 × 10-8
3.09 × 10-7
5.05 × 10-8
2.82 × 10-7
3.47 × 10-7
3.31 × 10-11
1.78 × 10-8
3.40 × 10-7
1.13 × 10-7
The most strongly associated SNP is rs10492477, located at 13q21. This SNP maps to the PCDH9 gene, which belongs to the protocadherin gene family, a subfamily of the cadherin superfamily. PCDH9 is predominantly expressed in brain, but is also expressed in hairy cell leukemia cells. Hairy cell leukemia can be responsible for polyarthritis due to immunity-drive inflammation, which can precede or follow the clinical onset of leukemic symptoms and usually presents as RA . PCDH9 has not been reported as a RA susceptibility locus, suggesting it is a new candidate gene.
The next most strongly associated SNP is SNP_A-1732768, located at 142.8 Mb on chromosome 2. This SNP is ~15 Mb away from the linkage region identified through linkage analysis in Caucasian families in the North American Rheumatoid Arthritis Consortium . In addition, rs4834009 (chromosome 4, 126.3 Mb), rs10520893 (chromosome 5, 23.7 Mb), and rs10509272 (chromosome 10, 67.8 Mb), are all within ~15 Mb of the linkage regions identified by Amos et al. . Although not reaching genome-wide significance, several other SNPs showed trend of association, including SNPs on chromosomes 6, 8, 11, 12, 16, 17, and 20.
Unexpectedly, we did not observe significant association between RA and PTPN22, despite that the association with PTPN22 has been replicated extensively. Further examination of the data revealed that among the 42 SNPs that were examined by the HapMap, only four of them were included in the Affymetrix 100 K array set. Surprisingly, we did not observe evidence of association between RA and the HLA complex either. Among the 102 SNPs were genotyped in the HLA region, 85 passed our data quality checking, and the most strongly associated SNP had a p-value of 0.05. A recent study of the extended MHC region identified 6338 SNPs , whereas 5 only 1.6% of them are included in the Affymetrix 100 K array set. Because association analysis depends critically on the degree of LD between the tested marker and the unobserved disease locus, it is indeed not surprising that given the limited coverage of the HLA region, the current data did not support evidence of association.
We conducted genome-wide linkage analysis using SNPs genotyped by the Illumina linkage panel and genome-wide association analysis using SNPs genotyped by the Affymetrix 100 K platform on a set of affected relative pairs of RA in CRAGS. Multipoint nonparametric linkage analysis identified three linkage peaks with maximum LOD score greater than 1.5. Our single marker association analysis showed strong evidence of association on chromosomes 1, 2, 4, 5, 7, 9, 10, 11, 13, 15, and 18. Several significantly associated SNPs locate at or close to the previously detected RA linkage regions, but not in the linkage peaks identified in the CRAGS.
For the well-known RA-susceptibility loci-HLA-DRB1 and PTPN22-we did not find evidence of association. Further examination of the data revealed that both regions are not well covered by the Affymetrix 100 K platform. Another possible reason is that the sample size available to this investigation is limited. Although genome-wide association is a promising approach to search susceptibility genes for complex diseases, the success of this approach depends critically on several factors, including the effect size of the disease genes, LD around the disease loci, and the sample size of the study. Our results indicate that future genome-wide association studies should employ a platform that has better coverage across the genome.
This study was supported by the University Research Foundation grant and the McCabe Pilot Award from the University of Pennsylvania to ML.
This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.
- Gregersen PK: Pathways to gene identification in rheumatoid arthritis: PTPN22 and beyond. Immun Rev. 2005, 204: 74-86. 10.1111/j.0105-2896.2005.00243.x.View ArticlePubMedGoogle Scholar
- Begovich AB, Carlton VEH, Honigberg LA, Schrodi SJ, Chokkalingam AP, Alexander HC, Ardlie KG, Huang Q, Smith AM, Spoerke JM, Conn MT, Chang M, Chang SY, Saiki RK, Catanese JJ, Leong DU, Garcia VE, McAllister LB, Jeffery DA, Lee AT, Batliwalla F, Remmers E, Criswell LA, Seldin MF, Kastner DL, Amos CI, Sninsky JJ, Gregersen PK: A missense single-nucleotide polymorphism in a gene encoding a protein tyrosine phosphatise (PTPN22) is associated with rheumatoid arthritis. Am J Hum Genet. 2004, 75: 330-337. 10.1086/422827.View ArticlePubMed CentralPubMedGoogle Scholar
- Kong A, Cox NJ: Allele-sharing models: LOD scores and accurate linkage tests. Am J Hum Genet. 1997, 61: 1179-1188. 10.1086/301592.View ArticlePubMed CentralPubMedGoogle Scholar
- Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin: rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97-101. 10.1038/ng786.View ArticlePubMedGoogle Scholar
- Li M, Boehnke M, Abecasis GR: Joint modeling of linkage and association: identifying SNPs responsible for a linkage signal. Am J Hum Genet. 2005, 76: 934-949. 10.1086/430277.View ArticlePubMed CentralPubMedGoogle Scholar
- Li M, Boehnke M, Abecasis GR: Efficient study designs for test of genetic association using sibship data and unrelated cases and controls. Am J Hum Genet. 2006, 78: 778-792. 10.1086/503711.View ArticlePubMed CentralPubMedGoogle Scholar
- Vernhes JP, Schaeverbeke T, Fach J, Lequen L, Bannawarth B, Dehais J: Chronic immunity-driven polyarthritis in hairy cell leukemia: report of a case and review of the literature. Rev Rhum Engl Ed. 1997, 64: 578-581.PubMedGoogle Scholar
- Amos CI, Chen WV, Lee A, Li W, Kern M, Lundsten R, Batliwalla F, Wener M, Remmers E, Kastner DA, Criswell LA, Seldin MF, Gregersen PK: High density SNP analysis of 642 Caucasian families with rheumatoid arthritis identifies two new linkage regions on 11p12 and 2q33. Genes Immun. 2006, 7: 277-286. 10.1038/sj.gene.6364295.View ArticlePubMedGoogle Scholar
- de Bakker PIW, McVean G, Sabeti PC, Miretti MM, Green T, Marchini J, Ke X, Monsuur AJ, Whittaker P, Delgado M, Morrison J, Richardson A, Walsh EC, Gao X, Galver L, Hart J, Hafler DA, Pericak-Vance M, Todd JA, Daly MJ, Trowsdale J, Wijmenga C, Vyse TJ, Beck S, Murray SS, Carrington M, Gregory S, Deloukas P, Rioux JD: A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat Genet. 2006, 38: 1166-1172. 10.1038/ng1885.View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.