Evaluating gene × gene and gene × smoking interaction in rheumatoid arthritis using candidate genes in GAW15
© Mei et al; licensee BioMed Central Ltd. 2007
Published: 18 December 2007
We examined the potential gene × gene interactions and gene × smoking interactions in rheumatoid arthritis (RA) using the candidate gene data sets provided by Genetic Analysis Workshop 15 Problem 2. The multifactor dimensionality reduction (MDR) method was used to test gene × gene interactions among candidate genes. The case-only sample was used to test gene × smoking interactions. The best predictive model was the single-locus model with single-nucleotide polymorphism (SNP) rs2476601 in gene PTPN22. However, no clear gene × gene interaction was identified. Substantial departure from multiplicativity was observed between smoking and SNPs in genes CTLA4, PADI4, MIF, and SNPs on chromosome 5 and one haplotype of PTPN22. The strongest evidence of association was identified between the PTPN22 gene and RA status, which was consistently detected in single SNP association, gene × gene interaction and gene × smoking interaction analyses.
Rheumatoid arthritis (RA) is a complex autoimmune disease. The etiology of the disease is not clearly understood. Risk factors of RA include genetic factors, race (Native American), female gender, obesity, old age, and smoking [1, 2]. However, like most complex diseases, few studies of gene × gene interaction and gene × environmental interaction have been performed because a large sample size is required to identify such effects in traditional statistical paradigms. Logistic regression is commonly used in detecting interactive effects between genes or environmental factors in epidemiologic studies. However, the parameters cannot be accurately estimated when there are many independent variables while the sample size is not large enough . Recently, Ritchie et al.  introduced a multifactor dimensionality reduction (MDR) method for identifying gene × gene interaction or gene × environmental interaction to overcome this limitation of traditional logistic regression [3–5]. This approach enumerates all possible combinations of genotype or environmental factors associated with high risk and low risk of disease, and it may enable us to find interactions between genes in the absence of main effects [3–5].
To detect potential epistasis in RA, we evaluated 1) disease associations using single SNPs (single-nucleotide polymorphisms) from 15 candidate genes and haplotypes of the PTPN22 gene, 2) gene × gene interactions among the candidate genes using the MDR method and logistic regression, and 3) gene × environmental (smoking) interactions using a case-only study design.
The data sets for the candidate gene studies of RA were provided by Genetic Analysis Workshop 15 (GAW15) Problem 2. There were two case-control data sets. The first one included 855 unrelated controls and 839 cases, as well as genotype data on 20 SNPs from 15 candidate genes, which were selected from previously published associations with RA or other autoimmune disorders by Plenge et al. . The second data set included 1519 unrelated controls and 1393 cases, and genotype data on 14 SNPs from the PTPN22 gene. Additional phenotype data, including smoking history, age of onset, sex, and body mass index, were available for cases only in both data sets. There were 408 and 720 affected sibling pairs among cases in the two data sets, respectively.
Single SNP and haplotype (PTPN22 only) associations with disease status were first evaluated. To account for the dependency among family members, the generalized estimating equations methods (GEE1)  as implemented in the GENMOD procedure of SAS 9.0 was utilized in the association analysis by using family as the cluster factor, i.e., members from the same family were assumed to be correlated and those from different families were assumed to be independent. The haplotype block structure of PTPN22 was evaluated by Haploview . Individual haplotypes were reconstructed using the PHASE 2.0 by assigning each haplotype with maximum probability . Seventy-four percent of haplotype assignments had probabilities of 100% and 93% had probabilities of 80% or better. Individuals whose haplotype assignment had probability below 80% were excluded from subsequent analysis. Association analysis was carried out for each common haplotype in turn. For each haplotype, a dominant model was assumed, i.e., carriers of the particular haplotype versus non-carriers were compared for their RA status.
To test gene × gene interactions, MDR was used to determine the genetic model that could most successfully predict the disease status or phenotype from several loci. SNP rs2240340 on the PADI4 gene was excluded from analysis due to its large amount of missing data. One thousand three hundred and thirty case-control samples with completed marker data on 19 SNPs from 14 candidate genes were utilized in the MDR analysis. Cross-validation (CV) consistency and balanced accuracy estimates were calculated for each combination of a pool of genetic polymorphisms. The model with the highest accuracy and maximal CV was considered to be the best . We determined statistical significance by comparing the accuracy of the observed data with the distribution of accuracy under the null hypothesis of no associations derived empirically from 1000 replicates of permutations . The null hypothesis was rejected when the p-value derived from the permutation test was 0.05 or less. As a follow-up, logistic regression analysis was conducted if there was suggestive interaction.
We also examined the interaction between SNPs and smoking history in RA cases. The logistic function in the GENMOD procedure was used to quantify departure from multiplicativity. Odds ratios and 95% CIs were estimated. To adjust for multiple tests, empirical p-values were obtained from 1000 permutations. For the PTPN22 gene, interaction effects between PTPN22 haplotypes and smoking among cases were evaluated for RA status.
1. Single SNP and PTPN22haplotype association
Association between SNPs and RA
p-value (11 vs. 12 vs. 22)
p-value (11/12 vs. 22)
Five common haplotypes of the PTPN22 with frequency >10% were constructed. Of the two haplotypes with significant associations with RA, one was a risk haplotype (11222221122221; 1: minor allele, 2: major allele; frequency: 11.6%), with a higher carrier frequency in cases than in controls (30.0% vs. 14.9%, p < 0.0001); whereas the other was protective (22122222222222; frequency: 10.9%), with a lower carrier frequency in cases than in controls (16.4% vs. 24.7%, p < 0.0001).
2. Gene × gene interaction
Multilocus interaction model for RA selected from MDR
rs2476601 (PTPN22) rs1248696 (DLG5)
rs2476601 (PTPN22) rs6149307 (HAVCRI) rs2243250 (IL4)
rs2476601 (PTPN22) IGR2096ms1 (chr 5) rs237025 (SUMO4) rs2268277 (RUNX1)
3. Gene × smoking interaction
Gene × smoking interactions
IGR3084ms1 (chr 5)
IGR3138ms1 (chr 5)
One of the common haplotype of PTPN22 (22222222211221, frequency: 18%) was found to interact with ever smoking at borderline significant level (OR = 0.78, 95%CI: 0.60–1.01, p = 0.06); however, the risk and the protective haplotypes that were identified previously in the case-control sample did not show any departure from multiplicativity with smoking in the case-only study.
We explored gene × gene and gene × smoking interactions using the candidate gene data set provided by GAW15. The best predictive model for RA status is the single-locus model containing rs2476601 on gene PTPN22. SNP rs2476601 is a well known functional SNP that is associated with increased risk of RA. The best combination model selected by MDR consisted of rs2476601 on PTPN22 and rs1248696 on DLG5. However, the susceptibility interaction was not confirmed in the following logistic regression analysis. The possible reason for the inconsistent results is that in MDR, we actually did not test statistical interaction which was defined as 'deviation from multiplicativity' as in logistic regression. The significant results from MDR only implies that the combination of the markers contributes to an increased or decreased risk of disease and the effect between the markers could be either multiplicative or deviation from multiplicative.
The case-only study has its particular advantage in testing gene × environmental interaction and it requires smaller sample size . It allows us to test interactive effects in the absence of the information from controls under the assumption that the two risk factors are independently distributed in the population at risk . In GAW15 Problem 2, we used this design to identify a gene × smoking interaction in RA because no smoking information was available from controls. We assumed genetic polymorphism and smoking exposure are independent of one another in controls. Substantial departure from multiplicativity was observed between ever smoking and markers from CTLA4, PADI4, MIF, and chromosome 5. Among these markers, only SNP CT60 from gene CTLA4 showed a main effect with RA in the single SNP analysis. One possible explanation for this phenomenon is that the existence of gene × smoking interactions could mask the true genetic effect if we only test the marginal association, especially when the gene status modifies the smoking effect in the opposite directions in the total sample. Another possible explanation is the difference in the tested samples: only cases were used in the gene × smoking interaction studies, while the single SNP association was evaluated in the case-control sample.
PTPN22 has been reported to be associated with RA [6, 12]. In this study, we tested single gene association, gene × gene interactions and gene × smoking interactions using three different methods. In single SNP analysis, PTPN22 showed the strongest association with RA status (p < 0.0001). In the following gene × gene interaction analyses by MDR, both the best single and the best combined models included PTPN22 gene. Furthermore, haplotype analysis using the second data set identified two haplotypes of the PTPN22 associated with RA and more importantly, there was a trend toward interaction between this gene and smoking. Therefore, the consistent findings here provide further evidence of the genetic involvement of PTPN22 in the etiology of RA.
In conclusion, our analyses confirmed the role of genetic and environmental factors in rheumatoid arthritis. Strong evidence of association was identified for the PTPN22 gene, which was observed in all three analyses. Other genes (HAVCRI, CTLA4, SUMO4, MAP3K7IP2, PAID4, chromosome 5 locus, MIF) may also contribute to the development of rheumatoid arthritis directly or within the context of smoking.
This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.
- Voigt LF, Koepsell TD, Nelson JL, Dugowson CE, Daling JR: Smoking, obesity, alcohol consumption, and the risk of rheumatoid arthritis. Epidemiology. 1994, 5: 525-532.PubMedGoogle Scholar
- Klareskog L, Stolt P, Lundberg K, Källberg H, Bengtsson C, Grunewald J, Rönnelid J, Harris HE, Ulfgren AK, Rantapää-Dahlqvist S, Eklund A, Padyukov L, Alfredsson L: A new model for an etiology of rheumatoid arthritis: smoking may trigger HLA-DR (shared epitope)-restricted immune reactions to autoantigens modified by citrullination. Arthritis Rheum. 2006, 54: 38-46. 10.1002/art.21575.View ArticlePubMedGoogle Scholar
- Moore JH, Williams SM: New strategies for identifying gene-gene interactions in hypertension. Ann Med. 2002, 34: 88-95. 10.1080/07853890252953473.View ArticlePubMedGoogle Scholar
- Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet. 2001, 69: 138-147. 10.1086/321276.View ArticlePubMed CentralPubMedGoogle Scholar
- Moore JH: Computational analysis of gene-gene interactions using multifactor dimensionality reduction. Expert Rev Mol Diagn. 2004, 4: 795-803. 10.1586/14737184.108.40.2065.View ArticlePubMedGoogle Scholar
- Plenge RM, Padyukov L, Remmers EF, Purcell S, Lee AT, Karlson EW, Wolfe F, Kastner DL, Alfredsson L, Altshuler D, Gregersen PK, Klareskog L, Rioux JD: Replication of putative candidate-gene associations with rheumatoid arthritis in >4,000 samples from North America and Sweden: association of susceptibility with PTPN22, CTLA4, and PADI4. Am J Hum Genet. 2005, 77: 1044-1060. 10.1086/498651.View ArticlePubMed CentralPubMedGoogle Scholar
- Zeger SL, Liang KY: Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986, 42: 121-130. 10.2307/2531248.View ArticlePubMedGoogle Scholar
- Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005, 21: 263-265. 10.1093/bioinformatics/bth457.View ArticlePubMedGoogle Scholar
- Stephens M, Smith NJ, Donnelly P: A new statistical method for haplotype reconstruction from population data. Am J Hum Genet. 2001, 68: 978-989. 10.1086/319501.View ArticlePubMed CentralPubMedGoogle Scholar
- Coffey CS, Hebert PR, Krumholz HM, Morgan TM, Williams SM, Moore JH: Reporting of model validation procedures in human studies of genetic interactions. Nutrition. 2004, 20: 69-73. 10.1016/j.nut.2003.09.012.View ArticlePubMedGoogle Scholar
- Thomas DC: Statistical Methods in Genetic Epidemiology. 2004, New York: OxfordGoogle Scholar
- Carlton VE, Hu X, Chokkalingam AP, Schrodi SJ, Brandon R, Alexander HC, Chang M, Catanese JJ, Leong DU, Ardlie KG, Kastner DL, Seldin MF, Criswell LA, Gregersen PK, Beasley E, Thomson G, Amos CI, Begovich AB: PTPN22 genetic variation: evidence for multiple variants associated with rheumatoid arthritis. Am J Hum Genet. 2005, 77: 567-581. 10.1086/468189.View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.