Incorporating quantitative variables into linkage analysis using affected sib pairs.

Rheumatoid arthritis is a complex disease in which environmental factors interact with genetic factors that influence susceptibility. Incorporating information about related quantitative traits or environmental factors into linkage mapping could therefore greatly improve the efficiency and precision of identifying the disease locus. Using a multipoint linkage approach that allows the incorporation of quantitative variables into multipoint linkage mapping based on affected sib pairs, we incorporated data on anti-cyclic citrullinated peptide antibodies, immunoglobulin M rheumatoid factor and age at onset into genome-wide linkage scans. The strongest evidence of linkage was observed on chromosome 6p with a p-value of 3.8 x 10(-15) for the genetic effect. The trait locus is estimated at approximately 45.51-45.82 cM, with standard errors of the estimates range from 0.82 to 1.26 cM, depending on whether and which quantitative variable is incorporated. The standard error of the estimate of trait locus decreased about 28% to 35% after incorporating the additional information from the quantitative variables. This mapping technique helps to narrow down the regions of interest when searching for a susceptibility locus and to elucidate underlying disease mechanisms.


Background
Several biomarkers, including anti-cyclic citrullinated peptide (anti-CCP) antibodies and immunoglobulin M rheumatoid factor (IgM RF), are used to characterize rheumatoid arthritis (RA). Anti-CCP antibodies and IgM RF are important surrogate markers for diagnosis and prog-nosis in RA. The genetic mechanism of these biomarkers might directly underlie disease status. If not, the quantitative trait loci (QTL) might be linked to the loci responsible for RA, or the quantitative trait might interact phenotypically with RA. Hence, incorporating the quantitative trait into analyses will increase our ability to map the genes that predispose to RA [1]. In addition, age at onset is associated with an increased risk of RA [2,3]. Therefore, we also incorporated age of onset into our analysis to increase the power of localizing disease locus.
Recently, Liang et al. [4] proposed a robust multipoint linkage analysis approach using affected sib pairs, which provides an estimate of the genetic effect and the location of the disease locus τ, along with sampling uncertainty to help investigators to narrow down chromosomal regions putatively harboring disease locus. The genetic effect is denoted by "C", and the value, (1 + C)/2, characterizes the probability of an affected sib pair sharing the same allele at τ from the parent. Chiou et al. [5] extended this method to estimate C nonparametrically by incorporating the information from either a quantitative trait or covariate, aiming to estimate τ more efficiently. Hence, in the present study, we incorporate information from several quantitative variables associated with RA, including anti-CCP, IgM RF, and age at onset, into our linkage mapping to enhance the efficiency of identifying the locus responsible for RA.

Materials
Data from a total of 1096 affected sib pairs from 757 multiplex families in the North American Arthritis Consortium study (NARAC) were included. Only 615 or 627 sib pairs (depending on the chromosomal regions) had genotype information, and thus these were used for our analysis. The NARAC multiplex families contain 8017 individuals, about 90.6% of whom are Caucasian, the rest are Hispanic (5.42%), African-American (2.92%), Native American (0.57%), and Asian (0.51%). When we performed the analyses using the subset of Caucasians and the whole data set, the results were quite similar; hence, we report only the results from whole data set here.
A total of 375 microsatellite markers were used in the analyses. The genotype data for 12 affected sib pairs on chromosomes 1 to 11, 13 to 16, and 19 to 22 were missing; therefore, 615 affected sib pairs were available for those chromosomes and 627 affected sib pairs were available for chromosomes 12, 17, and 18. Due to the missingness of the incorporated quantitative variables, the total number of affected sib pairs being included in the analysis varied from 588 to 622, depending on the quantitative variable and on the genotype data on the chromosomal regions.

Linkage approaches
The GeneHunter program was used to calculate identity by descent (IBD) sharing of affected sib pairs. The Gene-Finder program was used to perform the linkage mapping with estimates of C and τ based on the phenotype of dis-ease status only. It provides estimates of τ and its 95% confidence interval as well as the p-value of testing whether C = 0 (namely, if the linkage is present). A Fortran program developed by Chiou et al. [5] was applied to estimate τ and its 95% confidence interval where C was a non-linear function of the quantitative variable and was estimated nonparametrically. C as a function of a quantitative covariate was estimated by such that was the minimizer of the following kernel weighted least squares function with respect to , where was the imputed IBD sharing at ( ), an the values of the quantitative variable for the i th affected sib pair. K 2 was a kernel function, and H was a non-singular square bandwidth matrix.
Because this method is an extension to the approach proposed by Liang et al. [4], it is also a robust approach in that no assumption about the genetic mechanism is required other than that the region contains no more than one susceptibility locus for the qualitative trait. No assumption about the underlying genetic mechanism of an incorporated quantitative trait is required.
We compared the results from incorporating the quantitative variable with those obtained from the GeneFinder search to evaluate the efficiency gained by the additional information from a given quantitative variable.

Results
We plotted the average estimated IBD sharing from affected sib pairs along the autosomal chromosome regions, as demonstrated in Figure 1. The peak on chromosome 6 was distinguishable from the others. The Gen-eFinder search results showed that the disease locus (τ) was estimated to be at 45.87 cM, with a 95% confidence interval of [43.39, 48.34] on chromosome 6 ( Table 1) (Table 1).
By incorporating anti-CCP into the linkage mapping (  S i * ( ) τ τ some 6. The standard error decreased from 1.26 to 0.82; thus, the corresponding 95% confidence interval decreased by 1.73 cM. Similarly, by incorporating IgM RF ( Table 3) or age at onset (Table 4), the estimates of the disease locus were located at the same region. The corresponding 95% confidence intervals decreased 1.39 cM for RF IgM and 1.59 cM for age at onset. In addition, after incorporating anti-CCP, IgM RF, or age at onset, the standard errors for the estimates of τ reduced by roughly 34 to 77% on chromosomes 9, 10, and 18, respectively. The numbers of affected sib pairs with anti-CCP, IgM RF, and age at onset available were 588, 611, and 605, respectively, on chromosomes 1 to 11, 13 to 16, and 19 to 22; and were 600, 622, and 617, respectively, on chromosomes 12, 17, and 18, less than the numbers of affected sib pairs of 615 and 627 on these two sets of chromosomal regions, respectively, due to the presence of missing quantitative variables. The reduction of standard errors for the estimates of τ when incorporating these quantitative variables were in fact computed based on fewer sib pairs.

Conclusion
The diagnosis of RA is generally based on the presence and titer of specific autoantibodies (IgM RF and anti-CCP); joint involvement, according to the joint alignment and motion (JAM) score; the presence and extent of erosive disease on hand/wrist radiographs; functional status according to the health assessment questionnaires (HAQ) scores, age and calendar year of RA onset; the presence of nodules or other extra-articular manifestations; and the presence of other autoimmune diseases [3]. Hence, searching for the disease susceptibility loci based on the disease status defined by a threshold process might miss out a lot of information contained in the quantitative variables.
We demonstrated that the efficiency of the disease locus localization was greatly improved by the incorporation of quantitative variables related to RA. By applying this approach, the investigators would be able to narrow down the regions of interest when searching for disease suscep-Autosomal-wide estimated IBD sharing for affected sib pairs Figure 1 Autosomal-wide estimated IBD sharing for affected sib pairs. tibility loci. The significance of the improvement in the location estimate could be assessed by a bootstrap method. We are currently conducting systematic simulation studies to assess the improvement, the results will be reported elsewhere.
The estimated standard error of after incorporating anti-CCP (0.82) was smaller than that after incorporating IgM RF (0.91), suggesting that anti-CCP provides slightly more information about RA than IgM RF, consistent witĥ τ  the results from other studies [3]. The estimated standard error of after incorporating age at onset was 0.86, similar to what was observed for anti-CCP, indicating that age at onset contains as much information about RA as anti-CCP and is also related to the underlying genetic mecha-nism of RA. The confidence regions on all the chromosomes with linkage evidence were narrowed down by incorporating one of the three quantitative covariates. Among them, the confidence region on chromosomes 5 had the greatest reduction by incorporating anti-CCP, while incorporating IgM RF or age at onset had the largest τ effect on shortening the confidence region on chromosome 18. For a specific chromosomal region, the reduction of the standard error for the trait locus estimate varied by the quantitative covariate incorporated, indicating genetic heterogeneity existed among these RA-related quantitative variables. By examining the efficiency gain through incorporating a quantitative covariate, we are not only able to locate the susceptibility locus more precisely but also to identify genes related to distinct pathways associated with different quantitative variables. These findings suggested that incorporations of quantitative RA phenotypes or RA-related covariates did increase the power of the identification of genes in this approach and therefore help elucidate underlying disease mechanisms of RA.

Competing interests
The author(s) declare that they have no competing interests.