Constructing gene association networks for rheumatoid arthritis using the backward genotype-trait association (BGTA) algorithm
© Ding et al; licensee BioMed Central Ltd. 2007
Published: 18 December 2007
Rheumatoid arthritis (RA, MIM 180300) is a common and complex inflammatory disorder. The North American Rheumatoid Arthritis Consortium (NARAC) data, as part of the Genetic Analysis Workshop 15 data, consists of both genome scan and candidate gene studies on RA patients.
We applied the backward genotype-trait association (BGTA) algorithm to capture marginal and gene × gene interaction effects of multiple susceptibility loci on RA disease status. A two-stage screening approach was used for the genome scan, whereas a comprehensive study of all possible subsets was conducted for the candidate genes. For the genome scan, we constructed an association network among 39 genetic loci that demonstrated strong signals, 19 of which have been reported in the RA literature. For the candidate genes, we found strong signals for PTPN22 and SUMO4. Based on significant association evidence, we built an association network among the loci of PTPN22, PADI4, DLG5, SLC22A4, SUMO4, and CARD15. To control for false positives, we used permutation tests to constrain the family-wise type I error rate to 1%.
Using the BGTA algorithm, we identified genetic loci and candidate genes that were associated with RA susceptibility and association networks among them. For the first time, we report possible interactions between single-nucleotide polymorphisms/genes, which may be useful for biological interpretation.
Rheumatoid arthritis (RA) is a heterogeneous disease with a complex genetic component. Previous studies identified multiple genetic regions that might be associated with RA. Amos et al.  identified strong linkage evidence on the major histocompatibility complex (MHC) region, 2q33 (CTLA4) and 11p12 in a genome scan. Plenge et al.  selected 14 genes that may be associated with RA or related autoimmune disorders and carried out a case-control study on these candidate genes, with significant results on PTPN22, CTLA4, and PADI4.
A common approach used in most association studies is to search, in a marker-by-marker fashion, for loci in association with the disease. This approach precludes consideration of interactions between genetic loci, resulting in loss of information that is important in understanding complex traits. On the other hand, consideration of high-dimensional interactions makes the computational complexity unrealistically high for large-scale studies. To address these difficulties, Lo and Zheng [3, 4] showed the backward haplotype transmission association (BHTA) method for case-parent trios to be powerful for studying complex human disorders because it efficiently uses multilocus information. The method was extended in the backward genotype-trait association (BGTA) algorithm for case-control designs by evaluating association information on unphased multilocus genotypes . In this paper, we applied BGTA to the Illumina genome scan (studied by Amos et al. ) and the candidate gene data (studied by Plenge et al. ) provided by NARAC as part of Problem 2 of Genetic Analysis Workshop 15.
20 SNPs genotyped on the loci of 14 putative RA candidate genes
where n d and n u are total numbers of cases and controls, and and are counts of genotype i (on the k markers under study) among cases and controls, respectively. With the constant (1/n d + 1/n u )-2, GTD has expectation 1 asymptotically under the null hypothesis of no association. If a marker is removed from the studied set, the GTD score might decrease or increase, thereby reflecting the contribution of that marker. The genotype-trait association (GTA) score for marker M given a current set of markers is defined as , where ΔGTD is the GTD score without M minus the GTD with M, and is an adjusting term defined in  that makes GTA have expectation 0 when none of the markers in the subset is associated with the trait. When M is not associated with the disease but some of the selected markers are, GTA is positive, indicating an information gain that occurs when M (i.e., noise) is removed. If M is associated with the trait, GTA will be negative, indicating an information loss, and the magnitude of its value reflects the importance of M.
Two-stage SNP selection
To overcome the computational complexity of analyzing 5407 SNPs while also considering interactions, we developed a two-stage SNP selection process (see Figure 1 for a flowchart). We assume that SNPs with high-dimensional interaction information will show some signal in pairwise GTD scores (this is an assumption that reduces computational burden). In Stage 1, we selected 1000 BGTA-irreducible pairs of SNPs with top GTD scores, which also included 22 SNPs with top marginal GTDs. This yielded 707 unique SNPs for the second stage, where we performed a regular BGTA screening on 700,000 random subsets of 8 SNPs. For discussion on the size of the subsets and the number of repeats, see Zheng et al. .
Candidate gene study
For the candidate gene set, we evaluated a total of 220 - 1 GTD scores on all possible subsets of 20 SNPs (except for the empty set) to enumerate GTD's distribution for each subset size. Then we performed 30,000 greedy BGTA screenings on subsets of 8 SNPs to identify local optimal BGTA-irreducible SNP clusters.
Selection threshold and evaluation of significance
Association network construction
Results and discussion
Genome scan results
Loci with a high joint GTD score identified in the genome scan
Candidate genes results
For the candidate gene data, we studied all possible subsets of SNPs and identified eight significant BGTA-irreducible subsets on seven SNPs after controlling for family-wise type I error. The marginally significant (one-marker subset) SNPs are rs2476601 (PTPN22), and rs237025 and rs577001 (both at the locus of SUMO4). The significantly associated subsets with two or more SNPs represent possible interactions between these candidate genes in affecting the risk of developing RA. In Figure 4, red edges indicate interactions involving more than two SNPs, whereas blue edges indicate pairwise interactions between two SNPs. As previously identified in , we found PTPN22 and PADI4. However, the SNP at locus PADI4 has a high level of missing values, so results at this locus should be interpreted with caution.
Genotype distributions of identified SNPs on SLC22A4, SUMO4, and CARD15 among cases and controlsa
Percentage among cases or controls
1/1, 2/2, 2/3
1/1, 2/2, 3/3
1/1, 2/4, 2/3
1/1, 2/4, 3/3
1/1, 4/4, 2/3
1/1, 4/4, 3/3
1/3, 2/2, 2/3
1/3, 2/2, 3/3
1/3, 2/4, 2/3
1/3, 2/4, 3/3
1/3, 4/4, 2/3
1/3, 4/4, 3/3
3/3, 2/2, 2/3
3/3, 2/2, 3/3
3/3, 2/4, 2/3
3/3, 2/4, 3/3
3/3, 4/4, 2/3
3/3, 4/4, 3/3
Logistic regression deviance table and likelihood ratio test results on data shown in Table 3
p-Value from LR test
SLC22A4 × SUMO4
SLC22A4 × CARD15
SUMO4 × CARD15
SLC22A4 × SUMO4 × CARD15
Combining the results from the genome scan and the candidate genes results, PTPN22 had the strongest evidence as an RA susceptibility gene. SLC22A4 also showed up in the results of both studies.
In this paper, the BGTA approach was applied to identify important genetic loci and gene × gene interactions on susceptibility to RA. Different analytical strategies were tailored for these two data sets of different nature, illustrating the applicability of BGTA and the GTD statistic to different studies. Using the BGTA method, both marginal and gene × gene interaction information were extracted and reflected in the GTD scores. Under a general analytical framework, both analyses result in association networks constructed based on gene clusters with significant association to RA. To overcome the dimensionality problems of a genome scan, we imposed a two-stage scheme based on BGTA screenings. For a small number of candidate genes, we used GTD directly on subsets of genes to identify clusters that were significantly associated with RA disease status. We addressed the multiple comparisons issue using the most direct permutation-based evaluation and controlled the FDR and the family-wise type I error rate. Both association networks identified in this paper demonstrated evidence on gene × gene interaction in affecting the risk of developing RA. Visualization of these networks displays interesting structures that could be used to generate testable biological hypotheses.
This research was supported in part by NIH grant R01 GM070789.
This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.
- Amos CI, Chen WV, Lee A, Li W, Kern M, Lundsten R, Batliwalla F, Wener M, Remmers E, Kastner DA, Criswell LA, Seldin MF, Gregersen PK: High-density SNP analysis of 642 Caucasian families with rheumatoid arthritis identifies two new linkage regions on 11p12 and 2q33. Genes Immun. 2006, 7: 277-286.View ArticlePubMedGoogle Scholar
- Plenge RM, Padyukov L, Remmers EF, Purcell S, Lee AT, Karlson EW, Wolfe F, Kastner DL, Alfredsson L, Altshuler D, Gregersen PK, Klareskog L, Rioux JD: Replication of putative candidate-gene associations with rheumatoid arthritis in >4,000 samples from North America and Sweden: association of susceptibility with PTPN22, CTLA4, and PADI4. Am J Hum Genet. 2005, 77: 1044-1060.View ArticlePubMed CentralPubMedGoogle Scholar
- Lo SH, Zheng T: Backward haplotype transmission association (BHTA) algorithm – a fast multiple-marker screening method. Hum Hered. 2002, 53: 197-215.View ArticlePubMedGoogle Scholar
- Lo SH, Zheng T: A demonstration and findings of a statistical approach through reanalysis of inflammatory bowel disease data. Proc Natl Acad Sci USA. 2004, 101: 10386-10391.View ArticlePubMed CentralPubMedGoogle Scholar
- Zheng T, Wang H, Lo SH: Backward genotype-trait association (BGTA)-based dissection of complex traits in case-control designs. Hum Hered. 2006, 62: 196-212.View ArticlePubMed CentralPubMedGoogle Scholar
- Scheet P, Stephens M: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006, 78: 629-644.View ArticlePubMed CentralPubMedGoogle Scholar
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003, 100: 9440-9445.View ArticlePubMed CentralPubMedGoogle Scholar
- Efron B: Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J Am Stat Assoc. 2004, 99: 96-104.View ArticleGoogle Scholar
- Storey JD: A direct approach to false discovery rates. J R Statist Soc B. 2002, 64: 479-498.View ArticleGoogle Scholar
- Adar E: GUESS: a language and interface for graph exploration. CHI'06: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2006, New York: ACM Press, 791-800.View ArticleGoogle Scholar
- John S, Shephard N, Liu G, Zeggini E, Cao M, Chen W, Vasavda N, Mills T, Barton A, Hinks A, Eyre S, Jones KW, Ollier W, Silman A, Gibson N, Worthington J, Kennedy GC: Whole-genome scan, in a complex disease, using 11,245 single-nucleotide polymorphisms: comparison with microsatellites. Am J Hum Genet. 2004, 75: 54-64.View ArticlePubMed CentralPubMedGoogle Scholar
- Osorio YFJ, Bukulmez H, Petit-Teixeira E, Michou L, Pierlot C, Cailleau-Moindrault S, Lemaire I, Lasbleiz S, Alibert O, Quillet P, Bardin T, Prum B, Olson JM, Cornélis F: Dense genome-wide linkage analysis of rheumatoid arthritis, including covariates. Arthritis Rheum. 2004, 50: 2757-2765.View ArticleGoogle Scholar
- MacKay K, Eyre S, Myerscough A, Milicic A, Barton A, Laval S, Barrett J, Lee D, White S, John S, Brown MA, Bell J, Silman A, Ollier W, Wordsworth P, Worthington J: Whole-genome linkage analysis of rheumatoid arthritis susceptibility loci in 252 affected sibling pairs in the United Kingdom. Arthritis Rheum. 2002, 46: 632-639.View ArticlePubMedGoogle Scholar
- Fisher SA, Lanchbury JS, Lewis CM: Meta-analysis of four rheumatoid arthritis genome-wide linkage studies: confirmation of a susceptibility locus on chromosome 16. Arthritis Rheum. 2003, 48: 1200-1206.View ArticlePubMedGoogle Scholar
- Wise CA, Bennett LB, Pascual V, Gillum JD, Bowcock AM: Localization of a gene for familial recurrent arthritis. Arthritis Rheum. 2000, 43: 2041-2045.View ArticlePubMedGoogle Scholar
- Kenealy SJ, Herrel LA, Bradford Y, Schnetz-Boutaud N, Oksenberg JR, Hauser SL, Barcellos LF, Schmidt S, Gregory SG, Pericak-Vance MA, Haines JL: Examination of seven candidate regions for multiple sclerosis: strong evidence of linkage to chromosome 1q44. Genes Immun. 2006, 7: 73-76.View ArticlePubMedGoogle Scholar
- Agresti A: Categorical Data Analysis. 2002, Hoboken, NJ: John Wiley & Sons, IncView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.