Volume 3 Supplement 7
Identification of gene-gene interaction using principal components
© Li et al; licensee BioMed Central Ltd. 2009
Published: 15 December 2009
After more than 200 genome-wide association studies, there have been some successful identifications of a single novel locus. Thus, the identification of single-nucleotide polymorphisms (SNP) with interaction effects is of interest. Using the Genetic Analysis Workshop 16 data from the North American Rheumatoid Arthritis Consortium, we propose an approach to screen for SNP-SNP interaction using a two-stage method and an approach for detecting gene-gene interactions using principal components. We selected a set of 17 rheumatoid arthritis candidate genes to assess both approaches. Our approach using principal components holds promise in detecting gene-gene interactions. However, further study is needed to evaluate the power and the feasibility for a whole genome-wide association analysis using the principal components approach.
It is common in candidate-gene or genome-wide association studies to perform single-gene association analysis. However, after more than 200 genome-wide association studies (GWAS), there have been fewer novel loci identified than expected , possibly due to small effects of individual genetic variations. By supplementing GWAS data with information from previous candidate-gene or functional studies, and considering genetic interaction effects, we may be able to identify groups of genes that contribute to a complex disease. Approaches for studying gene-environment and gene-gene interactions have been proposed for the analysis of candidate genes [2, 3] and genome-wide data . We extend two approaches proposed for single-gene and gene-environment interaction analyses, a principal component (PC) approach  and a two-step approach , to gene-gene interaction analysis. We compare these two approaches with the traditional approach of testing all pairwise single-nucleotide polymorphism (SNP) interactions to assess gene-gene interaction effects on rheumatoid arthritis in the North American Rheumatoid Arthritis Consortium (NARAC) data.
List of candidate genes
This approach was proposed by Gauderman et al.  to test for association between disease and multiple SNPs in a candidate gene. We extend this approach to test for gene-gene interaction. The procedure involves the following steps. 1) Let g lk be the number of minor alleles at SNP k for lth subject, l = 1, ..., N, k = 1, ..., K. 2) Calculate the correlation matrix R, where R ij = cor(g i , g j ) and g i and g j represent the genotypes of all subjects for SNP i and SNP j, respectively. 3) Decompose R by singular value decomposition: R = AΛA T 4) Determine the factor loading by . 5) Determine the PCs by PC = GA, where G is the standardized N × K matrix of genotypes. The standardized genotypes are calculated as: , where is the mean genotype across subjects and is the standard deviation.
Then, we use PCs that explain at least 80% of the variation as the gene representation to perform a gene-gene interaction analysis, by applying logistic regression to test for interaction between every combination of two PCs. Once significant PC interactions are identified, PC loadings may be used to determine the influence of a specific SNP on the PCs because the loading represents the correlation of a SNP with a component. For better visualization of the gene-PCs and their SNPs position with the LD block plots, we created a graphical display using our own function in the statistical package R and the computer program Haploview .
Murcray et al.  proposed a two-step approach for selecting SNPs involved in significant gene-environment interactions, where Step 1 consisted of a modified version of the case-only analysis [9, 10], and in Step 2, the significant SNP-environment interactions identified in Step 1 were tested using logistic regression. We modified their method to detect gene-gene interactions as follows:
For each pair of SNPs, we perform a test of association between the two SNPs (g1, g2) based on the approximate method to screen for epistasis implemented in PLINK  by combining cases and controls and coding g1 and g2 as 0, 1, or 2, representing the number of minor alleles. A χ2 with 1 degree of freedom is used to test the association between each pair of SNPs. Pairs of SNPs are selected for analysis in Step 2 if they exceed a given significance threshold, p <α*. In our case, we selected α* = 0.05.
where D represents the cases (D = 1) and controls (D = 0). An interaction is considered significant when the p-value of interaction (i.e., the p-value for testing H0: β3 = 0) is less than or equal to α/M, where α = 0.05.
We extended two approaches previously used for gene-level tests or gene-environment interaction analysis to screen for gene-gene interactions in 17 candidate genes for RA using the GAW16 NARAC data. In the PC approach we calculated the SNP loadings for each PC and viewed them in the context of the gene LD structure generated using Haploview (Figure 2). This comparison is useful to identify the contribution of each SNP in the PCs and its position in the gene. For the PC gene-gene interaction analysis we used PCs that explained 80% of the variation to limit the number of PCs. Using this method we identified several gene-gene interactions. Further study to investigate the power of this PC approach is needed. This approach has potential to be used as a screening tool to detect gene-gene interaction. Subsequently, a more detailed interaction analysis should be performed using the SNPs with higher loadings .
We could not identify any significant interactions using the two-step approach. There are several possibilities, including the elimination of SNPs with low allele frequency, and the choice of α* in Stage 1. Recently, a similar two-step method was proposed and shown to be more powerful than a one-step approach . Further evaluation of this approach is warranted.
Using PCs is a promising approach to screen for potential interactions. As shown in our results, it can detect interactions not observed based on SNP-SNP interactions assessed using either a single-step or a two-step approach. Furthermore, the method used to correct for multiple comparison also plays an important role.
List of abbreviations used
Genetic Analysis Workshop 16
Genome-wide association studies
North American Rheumatoid Arthritis Consortium
The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. We thank Jason Sinnwell and David Rider for their assistance in MACH (JS) and SNP selection for the candidate genes (DR) and the two reviewers for their insightful comments. Partial funding for this study was provided by NIH grant HL87660 (MdA and JL) and the S.C. Johnson Genomics of Addiction Program at Mayo Clinic (JMB and RT).
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/3?issue=S7.
- Office of Population Genomics: Overview: A catalogue of genome-wide association studies. [http://www.genome.gov/gwastudies/]
- Kallberg H, Padyukov L, Plenge RM, Ronnelid J, Gregersen PK, Helm-van Mil van der AH, Toes RE, Huizinga TW, Klareskog L, Alfredsson L, Epidemiological Investigation of Rheumatoid Arthritis study group: Gene-gene and gene-environment interactions involving HLA-DRB1, PTPN22, and smoking in two subsets of rheumatoid arthritis. Am J Hum Genet. 2007, 80: 867-875. 10.1086/516736.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhao J, Jin L, Xiong M: Test for interaction between two unlinked loci. Am J Hum Genet. 2006, 79: 831-845. 10.1086/508571.PubMed CentralView ArticlePubMedGoogle Scholar
- Marchini J, Donnelly P, Cardon LR: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet. 2005, 37: 413-417. 10.1038/ng1537.View ArticlePubMedGoogle Scholar
- Gauderman JW, Murcray C, Gilliland F, Conti DV: Testing association between disease and multiple SNPs in a candidate gene. Genet Epidemiol. 2007, 31: 383-395. 10.1002/gepi.20219.View ArticlePubMedGoogle Scholar
- Murcray C, Lewinger JP, Gauderman JW: Gene-environment interaction in genome wide association study. Am J Epidemiol. 2009, 169: 219-226. 10.1093/aje/kwn353.PubMed CentralView ArticlePubMedGoogle Scholar
- Li Y, Abecasis GR: Mach 1.0: rapid haplotype reconstruction and missing genotype inference [abstract 2290/C]. Proceedings of the American Society of Human Genetics: 2006 October 9-13; New Orleans. 2005, Rockville, MD: American Society of Human Genetics, Abstracts, 6-per-page.pdf, [http://www.ashg.org/genetics/ashg/annmeet/2006/call/pdf/2390]Google Scholar
- Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005, 21: 263-265. 10.1093/bioinformatics/bth457.View ArticlePubMedGoogle Scholar
- Piegorsch WW, Weinberg CR, Taylor JA: Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Stat Med. 1994, 13: 153-162. 10.1002/sim.4780130206.View ArticlePubMedGoogle Scholar
- Clayton D, McKeigue PM: Epidemiologic methods for studying genes and environmental factors in complex diseases. Lancet. 2001, 358: 1356-1360. 10.1016/S0140-6736(01)06418-2.View ArticlePubMedGoogle Scholar
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.PubMed CentralView ArticlePubMedGoogle Scholar
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003, 100: 9440-9445. 10.1073/pnas.1530509100.PubMed CentralView ArticlePubMedGoogle Scholar
- Chatterjee N, Kalayioglu Z, Moslehi R, Peters U, Wacholder S: Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. Am J Hum Genet. 2006, 79: 1002-1016. 10.1086/509704.PubMed CentralView ArticlePubMedGoogle Scholar
- Kooperberg C, LeBlanc M: Increasing the power of identifying interactions in genome-wide association studies. Genet Epidemiol. 2008, 32: 255-263. 10.1002/gepi.20300.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.