Volume 3 Supplement 7
Assessment of gene-covariate interactions by incorporating covariates into association mapping
© Chiu et al; licensee BioMed Central Ltd. 2009
Published: 15 December 2009
The HLA region is considered to be the main genetic risk factor for rheumatoid arthritis. Previous research demonstrated that HLA-DRB1 alleles encoding the shared epitope are specific for disease that is characterized by antibodies to cyclic citrullinated peptides (anti-CCP). In the present study, we incorporated the shared epitope and either anti-CCP antibodies or rheumatoid factor into linkage disequilibrium mapping, to assess the association between the shared epitope or antibodies with the disease gene identified. Incorporating the covariates into the association mapping provides a mechanism 1) to evaluate gene-gene and gene-environment interactions and 2) to dissect the pathways underlying disease induction/progress in quantitative antibodies.
Rheumatoid arthritis (RA) represents a complex disease in which genes and environmental factors interplay to manifest the symptoms of this condition. Despite the phenotypic heterogeneity involved in this disease, the genetic contribution to RA is estimated to be 50-60% , and the HLA region has the largest influence on genetic risk. Gregersen et al.  showed that HLA-DRB1 alleles encoding a common amino-acid sequence (the shared epitope (SE)) in the third hypervariable region of the DRB1 molecule have been identified as risk alleles for RA. The functional significance implied by the location of this SE sequence has stimulated efforts to search for the putative RA antigen . In addition, because most RA patients have autoantibody responses, including rheumatoid factor (RF, a measurement of the reactive IgM antibodies) and cyclic citrullinated peptides (anti-CCPs), Huizinga et al.  compared the HLA profile in a healthy population and in RA patients who did or did not produce anti-CCP antibodies, demonstrating strong interactions between the SE and anti-CCP antibodies. Based on these findings, we incorporated the number of SE alleles (categorized by NN, SN, and SS for 0, 1, and 2 SE alleles, respectively) and anti-CCP or RF into our association mapping to search for the disease locus and to assess the associations between SE, antibodies, and the disease gene in a region of 6p21 (previously identified in our linkage analyses). Two dummy variables, NN and SN, were created; SS was been treated as the reference group (NN = 1 if NN, 0 otherwise; SN = 1 if SN, 0 otherwise). Incorporating this association profile allows us to investigate associations between the disease locus and the covariates, and will help to elucidate etiopathologies between different phenotypes and the disease genes.
Recently, Liang and Chiu  proposed a robust multipoint association mapping approach using case-control data. This approach provides an estimate of the genetic effect and the location of the disease locus τ, along with sampling uncertainty to help investigators narrow down chromosomal regions that putatively harbor a disease gene (τ). The genetic effect denoted by "C" characterizes the excess disease allele frequency among cases compared to controls. Chiu et al.  extended this method to estimate C by incorporating covariates into the mapping, in order to estimate τ more efficiently. The covariates could be either quantitative or qualitative. Hence, we incorporated anti-CCP or IgM RF as well as the number of SE alleles into our association mapping to estimate the disease locus for RA and to assess interactions between the disease gene and the covariates simultaneously .
A total of 868 cases and 1194 controls in the North American Rheumatoid Arthritis Consortium study (NARAC) were included in analyses. We used PLINK [7, 8] to clean the data, excluding two cases and one control from the data set due to their genome-wide heterozygosity <30%. We also checked the call rate, cryptic relatedness, and multidimensional scaling. No additional subjects were excluded.
From the total 35,574 single-nucleotide polymorphisms (SNPs) available originally, we excluded the SNPs with a call rate <90% or MAF (minor allele frequency) <0.05, and those that failed the Hardy-Weinberg equilibrium test (p < 0.05) as measured by the PLINK software. A total of 29,616 SNPs remained for further gene mapping.
The 29,616 SNPs were used to make plots for the average excess target allele frequency in cases compared to controls, divided by the estimated magnitude of linkage disequilibrium (LD) (as shown on page 146 in Chiu et al.) . A region harboring 2732 SNPs with higher peaks between 28 and 40 cM was identified. A total of 1561 SNPs ranging from about 28,000 to 40,000 kb, with blocks defined by r2 > 0.7, was selected as tag SNPs using Haploview software. We converted base pair (bp) into centimorgan (cM) approximately by dividing bp by 106. Hence, our analyses focused on 706 cases with RF, anti-CCP, and number of SEs and 1191 controls with number of SEs.
including the following notations:
t An arbitrary location in the study region
τ Disease locus
: A vector of functions of covariates for cases (s D ) and controls ( )
C Genetic effect at τ
d t LD between t and τ (assuming it is independent of covariates)
If the case carries the high-risk allele from his/her father
If the case does not carry the high-risk allele from his/her father
If the case carries the high-risk allele from his/her mother
If the case does not carry the high-risk allele from his/her mother
Similarly, one can define the indicator variables, and , for controls accordingly.
In an unmatched case-control design, all possible combinations of case-control pairs were used to localize the disease locus in this approach, a covariate for each pair being defined by a function (g(•)) of the case's covariate (S D ) and the control's covariate ( ) [for example, or , etc.]. In this study, the covariates of antibodies were available in cases only; hence, we defined . We can show that the approach remains legitimate and the property in the Appendix in Chiu et al.  still holds true (unpublished data). For the covariate available in both cases and controls, , in the present study.
The regression coefficients β T characterize the associations between the genetic effect at τ and the covariates. For covariates available for cases and controls, the approach from Chiu et al.  was directly applicable. On the other hand, incorporating covariates into LD mapping can improve the efficiency in estimating τ when the covariate carries additional information on the association under study. Because this method represents an extension of the approach proposed by Liang and Chiu  that is based on the generalized estimating equation approach, it is also robust, given that no assumption about the genetic mechanism is required, other than that the region contains no more than one susceptibility locus for the qualitative trait while incorporating multiple markers into the analysis simultaneously. No assumption about the underlying genetic mechanism of an incorporated quantitative covariate is required either.
The LD mapping was conducted through the sliding window approach, in which sequential analyses were performed for every 100 SNPs. We examined the associations between the estimated disease locus and the covariates in the case-control study. In addition, we compared the results from incorporating the covariates with those from the original search, where no covariates were incorporated. This allowed us to evaluate the improvement in efficiency provided by the additional information from the given covariates. To make a comparison with other association analyses, we also performed individual SNPs analysis using the chi-square test in PLINK.
Association mapping without a covariate based on 706 cases and 1191 controls
<1.0 × 10-15
Association mapping with covariates RF, SN, and NN based on 706 cases and 1191 controls
Testing H0: β2 = β3
<1.0 × 10-15
<1.0 × 10-15
<1.0 × 10-15
Association mapping with covariates anti-CCP, SN, and NN based on 706 cases and 1191 controls
Testing H0: β2 = β3
<1.0 × 10-15
<1.0 × 10-15
<1.0 × 10-15
Association analysis of individual SNPs with a p-value < 1.0 × 10-30
MAF in cases
MAF in controls
1.00 × 10-96
5.33 × 10-39
2.20 × 10-31
9.41 × 10-56
1.23 × 10-31
1.08 × 10-59
3.33 × 10-56
6.56 × 10-38
3.23 × 10-42
1.43 × 10-47
1.00 × 10-48
8.99 × 10-50
8.02 × 10-63
1.42 × 10-107
1.10 × 10-41
1.03 × 10-76
1.97 × 10-40
3.64 × 10-56
7.19 × 10-47
5.02 × 10-38
Based on Figure 1 and the analysis results, the one-disease-locus assumption seems to be reasonable for the data in this region. Because results from previous studies suggested the presence of an association between anti-CCP and SE , we examined the linear relationship between anti-CCP and the number of SE alleles in the 706 cases using a linear regression model. Our results suggested that the anti-CCP levels were significantly different when comparing cases with 1 and 2 SE alleles (p = 0.0050), but were not significantly different when comparing cases with 0 and 2 SE alleles (p = 0.85). Further, when we excluded the two dummy variables for SE from the association mapping, anti-CCP remained insignificant (p = 0.98). Similarly, excluding the two dummy variables for SE from the association mapping yielded an insignificant association with RF (p = 0.63). These results suggested that the association between anti-CCP or RF and the estimated disease locus was not significantly dependent on the number of SE alleles. Additionally, the estimate of the disease locus was quite consistent with the result from the association analysis of individual SNPs, based on a chi-square test. The Marshfield genetic map for this region (6p22.1-6p21.2) is 44.41-54.76 cM, spanning 10.35 cM, while by dividing by 106, our map is 28 cM-44 cM, spanning 12 cM. The distances between markers will affect the results; however, because the SNP data are very dense, after converting the distances into recombination fractions, the difference resulting from two maps is likely to be very limited. The exact impact of this consideration warrants future exploration.
We applied a robust multipoint LD mapping approach to locate disease genes for RA with incorporation of covariates as well as to assess whether the disease genes were associated with the covariates. Our results suggest that the disease loci in the region of 6p21 were strongly associated with the SE alleles. The efficiency in estimating the disease genes remained similar when incorporating RF or anti-CCP into the mapping, revealing that these two quantitative covariates did not provide additional information on the disease locus localization. Through this application, we demonstrated that while performing multipoint fine mapping, this approach not only facilitates examination of gene-gene interactions and gene-covariate interactions, but also helps to elucidate the pathways of complex diseases.
List of abbreviations used
Anti-cyclic citrullinated peptide
Minor allele frequency
The authors are very grateful to the reviewers' for their constructive and insightful comments, which have greatly improved the quality of this work. This project is supported, in part, by a grant from the National Science Council (NSC97-2118-M-400-003) and by a grant from the National Health Research Institutes (BS-097-PP-04). We thank Ms. Susan Margitić for her editorial contribution to this manuscript.
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/3?issue=S7.
- MacGregor AJ, Snieder H, Rigby AS, Koskenvuo M, Kaprio J, Aho K, Silman AJ: Characterizing the quantitative genetic contribution to rheumatoid arthritis using data from twins. Arthritis Rheum. 2000, 43: 30-37. 10.1002/1529-0131(200001)43:1<30::AID-ANR5>3.0.CO;2-B.View ArticlePubMedGoogle Scholar
- Gregersen PK, Silver J, Winchester RJ: The shared epitope hypothesis. An approach to understanding the molecular genetics of susceptibility to rheumatoid arthritis. Arthritis Rheum. 1987, 30: 1205-1213. 10.1002/art.1780301102.View ArticlePubMedGoogle Scholar
- Huizinga TW, Amos CI, Helm-van Mil van der AH, Chen W, van Gaalen FA, Jawaheer D, Schreuder GM, Wener M, Breedveld FC, Ahmad N, Lum RF, de Vries RR, Gregersen PK, Toes RE, Criswell LA: Refining the complex rheumatoid arthritis phenotype based on specificity of the HLA-DRB1 shared epitope for antibodies to citrullinated proteins. Arthritis Rheum. 2005, 52: 3433-3438. 10.1002/art.21385.View ArticlePubMedGoogle Scholar
- Liang K-Y, Chiu Y-F: Multipoint linkage disequilibrium mapping using case-control designs. Genet Epidemiol. 2005, 29: 365-376. 10.1002/gepi.20104.View ArticlePubMedGoogle Scholar
- Chiu Y-F, Liang K-Y, Beaty TH, Chuang L-M: Incorporation of covariates into multipoint linkage disequilibrium mapping in case-control studies. Genet Epidemiol. 2008, 32: 143-151. 10.1002/gepi.20271.View ArticlePubMedGoogle Scholar
- Glidden DV, Liang KY, Chiu YF, Pulver AE: Multipoint affected sibpair linkage methods for localizing susceptibility genes of complex diseases. Genet Epidemiol. 2003, 24: 107-117. 10.1002/gepi.10215.View ArticlePubMedGoogle Scholar
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC: PLINK: A toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.PubMed CentralView ArticlePubMedGoogle Scholar
- PLINK...Whole-genome Association Toolset. [http://pngu.mgh.harvard.edu/purcell/plink/]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.