Skip to main content

A genome-wide ordered-subset linkage analysis for rheumatoid arthritis


Rheumatoid arthritis (RA) is a chronic, complex autoimmune inflammatory disorder with poorly known etiology. Approximately 1% of the adult population is afflicted with RA. Linkage analysis of RA can be complicated by the presence of phenotypic and genetic heterogeneity. It is shown that the ordered-subset analysis (OSA) technique reduces heterogeneity, increases statistical power for detecting linkage and helps to define the most informative data set for follow-up analysis. We applied OSA to the family data from the North American Rheumatoid Arthritis Consortium study as part of the Genetic Analysis Workshop 15 (GAW15). We have incorporated two continuous covariates, 'age of onset' and 'anti-CCP level' (anti-cyclic citrinullated peptide), into our genome-wide ordered-subset linkage analysis using 809 Illumina SNP markers in 5713 individuals from 606 Caucasian RA families. A statistically significant increase in nonparametric linkage (NPL) scores was observed with covariate 'age of onset' in chromosomes 4 (p = 0.000003) and 9 (p = 0.002). With the covariate 'anti-CCP level', statistically significant increases in NPL scores were observed in chromosomes 2 (p = 0.0001), 18 (p = 0.00007), and 19 (p = 0.0003). Once we identified the linked genomic region, we then attempted to identify the best plausible parametric model at that linked locus. Our results show significant improvement in evidence for linkage and demonstrate that OSA is a useful technique to detect linkage under heterogeneity.


Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disorder of unknown etiology. Although environmental influences may trigger a response leading to the development of this autoimmune disease, both genetic and environmental factors are implicated in its pathogenesis [1]. It affects approximately 1% of the adult population with a female:male ratio ranging from 2:1 to 4:1 [2]. RA typically has an onset of symmetric joint swelling and reaches a peak incidence in the fourth and fifth decades of life [2]. RA-induced inflammatory response in the synovial membrane is typically chronic and destructive [3]. The main presenting symptoms of RA are pain, marked morning stiffness, impaired physical function, swelling, and tenderness of the joints. Constitutional symptoms of RA are fever, weight loss, and fatigue.

RA is a clinically heterogeneous disease and most likely has complex genetic involvement. The presence of underlying genetic heterogeneity of a trait often masks the effect of genetic markers with disease predisposing variants; hence, there may not be linkage in families in which the marker is not involved in the disease etiology [4]. One method used to address genetic heterogeneity and strengthen linkage findings is to incorporate phenotypic subsetting of the data [5]. Most phenotypic stratification approaches require that subsets be identified before linkage studies. We have applied this technique to detect linkage in another autoimmune disease, systemic lupus erythematosus (SLE) [6]. Alternatively, one can account for disease heterogeneity is by incorporating trait-related covariate data. Therefore, to map genes for complex trait, genetic analysis methods should acknowledge the presence of genetic heterogeneity when appropriate. In the present analysis, we used ordered-subset analysis (OSA), a powerful technique for linkage analysis of traits characterized by genetic heterogeneity [7]. In OSA, using different covariates based on clinical features of the phenotype or on environmental exposures, one can identify more homogeneous subsets of families. Linkage that would otherwise be missed may then be apparent. Therefore, the goal of OSA is to identify regions with increased linkage in a subset of families. Additionally, by increasing genetic homogeneity, OSA can also reduce the linkage interval as exemplified by other complex, diseases including Alzheimer disease [8].

The aims of our present analysis are to: 1) identify homogeneous subset of families and assess linkage and its location, 2) rigorously analyze the homogenous subsets of families with statistically significant chromosomal locations to find a parsimonious genetic model.

Data and methods

We analyzed data from the North American Rheumatoid Arthritis Consortium (NARAC) study as part of the Genetic Analysis Workshop 15 (GAW15). Only Caucasian families were used for these analyses. Initially, out of the original 637 families, 31 families were removed due to mixed ethnicity or because they were uninformative for linkage analysis (single affected member per family). In larger families, ungenotyped individuals were trimmed to facilitate computation that otherwise was not possible due to time and memory constraints on the computer hardware used. We performed genome-wide linkage analysis of 809 Illumina SNP markers in 5713 individuals from 606 Caucasian rheumatoid arthritis families. Analyses were performed using FLOSS (Flexible Ordered Subset Analysis), MERLIN, GeneHunter, GeneHunter-Modscore, and Genehunter-Plus with the ASM (allele sharing model) module. We used several complementary programs to compare the accuracy of our results.

To date, several clinical and epidemiological factors have been identified as potential trait-related covariates for RA. Among them, increasing age of onset has been associated with worse outcome in RA, with evidence that there has recently been a shift towards an older age of onset [9]. There are also age differences in the strength of the association with risk factors like HLA, which might suggest that age has an effect on disease phenotype [10]. Recently, anti-CCP antibodies have been identified as highly specific for RA. These antibodies have also demonstrated prognostic utility with regard to radiographic outcomes [11, 12]. Therefore, we selected covariates 'age of onset' and 'anti-CCP level' (anti-cyclic citrinullated peptide) and used them in OSA to identify homogeneous subgroup of families for linkage analysis. These covariates were used to assign linkage scores to each family using MERLIN. Mean covariate value for the family members was specified for each family and the families were ordered according to their covariate score. Multipoint linkage analysis was performed on all subsets of families with k smallest or k largest covariate scores. Thus, the subset type used here was extreme. The FLOSS program was used to create a covariate file for family covariate scores for all families and all covariates, and to calculate nonparametric linkage (NPL) scores. Permutation tests were used to assess the null hypothesis of independence of family linkage scores at each locus and family covariate scores. Each subset of homogeneous families that generated a statistically significant linkage was analyzed with GeneHunter to further confirm the NPL score.

Once we identified the linked genomic region, we then attempted to identify the most plausible parametric model (allele frequency, penetrance, and mode of inheritance) at that linked location. For each subset, parametric LOD scores were maximized using GeneHunter-Modscore. These allele frequencies and penetrance values were utilized in GeneHunter/GeneHunter-Plus with ASM module, which provides NPL, nonparametric LOD, parametric LOD, and heterogeneity LOD (HLOD) scores. In addition, information content was provided, which gave an index of the inheritance information extracted at each point in the genome by the marker genotyped. The LIN function (linear model to evaluate the evidence for linkage as defined by Kong and Cox [18]) of allele sharing method was used to calculate nonparametric LOD scores.


The results of OSA along with the other relevant statistics are provided in Table 1. A significant increase in the evidence of linkage was observed at five chromosomal regions (Fig. 1). Using the covariate 'age of onset', statistically significant evidence of linkage was observed at chromosomes 4 (NPL = 4.5, p = 0.000003, peak at 102.03 cM, 472 families) and suggestive evidence was observed at chromosome 9 (NPL = 2.85, p = 0.002, peak at 0.59 cM, 27 families). With covariate 'anti-CCP level', statistically significant evidence of linkage was identified at chromosome 18 (NPL = 3.81, p = 0.00007, peak at 26.29 cM, 40 families), and suggestive evidence of linkage was observed at chromosome 2 (NPL = 3.66, p = 0.0001 peak at 154.11 cM, 219 families) and chromosome 19 (NPL = 3.28, p = 0.0003, peak at 52.21 cM, 10 families). The information content extracted at the linked region ranged from 46% to 80%.

Table 1 Summary of ordered subset linkage analysis
Figure 1
figure 1

Results of NPL analysis across the SNP marker positions. Results of NPL analysis across the SNP marker positions in chromosomes (2, 4, 9, 18, and 19) showing the evidence of linkage in ordered subset of the families (solid line) based on covariate scores compared to all 606 families (dashed line) using GeneHunter.

For each linkage region, the NPL score was significantly increased (p < 0.05) when we used all families versus subset of families. Further, the results with the GeneHunter program using the ordered subset families produced a statistically significant linkage that confirms the nearly identical NPL score obtained by the FLOSS program. Table 2 shows the results for parametric and nonparametric LOD scores obtained by incorporating the allele frequencies and penetrance of the best fitted model into GeneHunter-Plus with ASM module. Interestingly, these LOD scores are very similar at each linkage peak.

Table 2 Parametric and Non-parametric linkage analysis under the best fitted model

With the covariate 'age of onset', the age range in the ordered subset of families on chromosome 4 is between 31.5 and 83 years, whereas on chromosome 9 it is shifted more toward old age (59.5 and 83 years). The optimal range of 'anti-CCP level' in the ordered subset of families was greater in chromosomes 2 (133 to 413) and 18 (234 to 413), but lower for chromosome 19 (0.800 to 3.50). The values of peak maximized LOD (MOD) score and HLOD scores are nearly equal (which is expected). After using the allele-sharing model, not much difference was seen between the peak MOD score and nonparametric LOD scores produced by ASM except on chromosome 19.


We have identified five linked chromosomal regions (2, 4, 9, 18, and 19) that may harbor the susceptibility genes for RA. Previous studies [13, 14] had identified linkage at chromosomes 2, 4, and 18. Our results also support the possibility of RA susceptibility gene in chromosomes 4 and 9 using the covariate 'age of onset' and in chromosomes 2, 18, and 19 using the covariate 'anti-CCP level'. It is interesting to note that the optimal range changes very little for 'anti-CCP level' on chromosome 2 and 18 linkages, but is quite different for chromosome 19, with absolutely no overlap. This would suggest an easily identifiable subset of family. However, we have only 10 families in this group, therefore, another independent replication is required to assess the validity of this finding.

We considered both nonparametric and parametric linkage analysis in this study. Both parametric and nonparametric results are very similar in terms of detecting the peak linkage locations. If we use the nonparametric LOD score then we have evidence for three statistically significant linkages at chromosomes 2, 4, and 18 that exceed the Lander and Kruglyak criteria (LOD score of 3.3) [15]. However, this threshold is not corrected for multiple testing (at least four different tests were performed: two different covariates and two different linkage methods, nonparametric as well as parametric). To maintain the overall genome-wide significance level (5% level), we have used an ad hoc correction procedure that raised the threshold of LOD score to 3.9. [This is calculated as: LOD(corrected) = LOD(conv) + log10(#test) [16, 17], where LOD(conv) is conventional LOD score to be significant = 3.3.] Interestingly, all three linkages remain significant after correcting for multiple testing.

For a complex trait like RA, successful identification of genetic risk loci has relied on the ability to minimize disease and genetic heterogeneity to increase the power to detect linkage. One way to account for disease heterogeneity is by incorporating covariate data. Phenotypically similar families may be genetically more homogeneous as well, in which case OSA can greatly improve the power of linkage analysis. Our results clearly show that 'age at onset' and 'anti-CCP level' are potentially two clinical markers that can be useful to detect linkage for RA and that OSA is an important technique to identify the linkage in the presence of heterogeneity. Such linkage studies could now be used for candidate gene as well as and fine mapping studies to identify the actual RA susceptibility genes.


A genome-wide OSA was performed to identify the linkage for RA. We used two continuous covariates, 'age of onset' and 'anti-CCP level' to identify a more homogeneous group. We have identified two statistically significant regions with evidence of linkage at chromosomes 4 and 18 and three regions with suggestive evidence of linkage at chromosomes 2, 9, and 19. Our results clearly demonstrated that OSA is a useful technique to detect linkage under heterogeneity.


  1. Lekarski PM: Genetics in rheumatoid arthritis (RA). 2006, 20: 468-470.

    Google Scholar 

  2. Grossman JM, Brahn E: Rheumatoid arthritis: current clinical and research directions. J Womens Health. 1997, 6: 627-638.

    Article  PubMed  CAS  Google Scholar 

  3. Weyand CM, Goronzy JJ: Pathogenesis of rheumatoid arthritis. Med Clin North Am. 1997, 81: 29-55. 10.1016/S0025-7125(05)70504-6.

    Article  PubMed  CAS  Google Scholar 

  4. Browning BL: FLOSS: flexible ordered subset analysis for linkage mapping of complex traits. Bioinformatics. 2006, 22: 512-513. 10.1093/bioinformatics/btk012.

    Article  PubMed  CAS  Google Scholar 

  5. Leal SM, Ott J: Effects of stratification in the analysis of affected-sib-pair data: benefits and costs. Am J Hum Genet. 2000, 66: 567-575. 10.1086/302748.

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  6. Namjou B, Nath SK, Kilpatrick J, Kelly JA, Reid J, James JA, Harley JB: Stratification of pedigrees multiplex for systemic lupus erythematosus and for self-reported rheumatoid arthritis detects a systemic lupus erythematosus susceptibility gene (SLER1) at 5p15.3. Arthritis Rheum. 2002, 46: 2937-2945. 10.1002/art.10588.

    Article  PubMed  CAS  Google Scholar 

  7. Hauser ER, Watanabe RM, Duren L, Bass MP, Langfield CD, Boehnke M: Ordered subset analysis in genetic linkage mapping of complex traits. Genet Epidemiol. 2004, 27: 53-63. 10.1002/gepi.20000.

    Article  PubMed  Google Scholar 

  8. Scott WK, Hauser ER, Schmechel DE, Welsh-Bohmer KA, Small GW, Roses AD, Saunders AM, Gilbert JR, Vance JM, Haines JL, Pericak-Vance MA: Ordered-subsets linkage analysis detects novel Alzheimer disease loci on chromosomes 2q34 and 15q22. Am J Hum Genet. 2003, 73: 1041-1051. 10.1086/379083.

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  9. Wiles N, Symmons DP, Harrison B, Barrett E, Barrett JH, Scott DG, Silman AJ: Estimating the incidence of rheumatoid arthritis: trying to hit a moving target?. Arthr Rheumatol. 1999, 42: 1339-1346. 10.1002/1529-0131(199907)42:7<1339::AID-ANR6>3.0.CO;2-Y.

    Article  CAS  Google Scholar 

  10. MacGregor A, Ollier W, Thomson W, Jawaheer D, Silman A: HLA-DRB1*0401/0404 genotype and rheumatoid arthritis: increased association in men, young age at onset, and disease severity. J Rheumatol. 1995, 22: 1032-1036.

    PubMed  CAS  Google Scholar 

  11. Bizzaro N, Mazzanti G, Tonutti E, Villalta D, Tozzoli R: Diagnostic accuracy of the anti-citrulline antibody assay for rheumatoid arthritis. Clin Chem. 2001, 47: 1089-1093.

    PubMed  CAS  Google Scholar 

  12. Meyer O, Labarre C, Dougados M, Goupille P, Cantagrel A, Dubois A, Nicaise-Roland P, Sibilia J, Combe B: Anticitrullinated protein/peptide antibody assays in early rheumatoid arthritis for predicting five year radiographic damage. Ann Rheum Dis. 2003, 62: 120-126. 10.1136/ard.62.2.120.

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  13. Amos CI, Chen WV, Lee A, Li W, Kern M, Lundsten R, Batliwalla F, Wener M, Remmers , Kastner DA, Criswell LA, Seldin MF, Gregersen PK: High-density SNP analysis of 642 Caucasian families with rheumatoid arthritis identifies two new linkage regions on 11p12 and 2q33. Genes Immun. 2006, 7: 277-286. 10.1038/sj.gene.6364295.

    Article  PubMed  CAS  Google Scholar 

  14. Criswell LA, Gregersen PK: Current understanding of the genetic aetiology of rheumatoid arthritis and likely future developments. Rheumatology. 2005, 44 (Suppl 4): iv9-iv13. 10.1093/rheumatology/kei054.

    Article  PubMed  Google Scholar 

  15. Lander E, Kruglyak L: Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat Genet. 1995, 11: 241-247. 10.1038/ng1195-241.

    Article  PubMed  CAS  Google Scholar 

  16. Kidd KK, Ott J: Power and sample size in linkage studies. Cytogenet Cell Genet. 1984, 37: 510-511.

    Google Scholar 

  17. Ott J: Analysis of Human Genetic Linkage. 1999, Baltimore: Johns Hopkins University Press, 78-79. 3

    Google Scholar 

  18. Kong A, Cox NJ: Allele sharing models: LOD scores and accurate linkage tests. Am J Hum Genet. 1997, 61: 1179-1188. 10.1086/301592.

    Article  PubMed Central  PubMed  CAS  Google Scholar 

Download references


This study was supported by National Institutes of Health grant R01AI063622 and Oklahoma Medical Research Foundation institutional grant 9124.

This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at

Author information

Authors and Affiliations


Corresponding author

Correspondence to Swapan K Nath.

Additional information

Competing interests

The author(s) declare that they have no competing interests.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Mandhyan, D.D., Kim-Howard, X., Gaines, M. et al. A genome-wide ordered-subset linkage analysis for rheumatoid arthritis. BMC Proc 1 (Suppl 1), S101 (2007).

Download citation

  • Published:

  • DOI: