Volume 3 Supplement 7
Genetic Analysis Workshop 16
Application of seventeen twolocus models in genomewide association studies by twostage strategy
 Adan Niu^{1},
 Zhaogong Zhang^{1, 2} and
 Qiuying Sha^{1}Email author
DOI: 10.1186/175365613S7S26
© Niu et al; licensee BioMed Central Ltd. 2009
Published: 15 December 2009
Abstract
The goal of this paper is to search for twolocus combinations that are jointly associated with rheumatoid arthritis using the data set of Genetic Analysis Workshop 16 Problem 1. We use a twostage strategy to reduce the computational burden associated with performing an exhaustive twolocus search across the genome. In the first stage, the full set of 531,689 singlenucleotide polymorphisms was screened using univariate testing. In the second stage, all pairs made from the 500 singlenucleotide polymorphisms with the lowest pvalues from the first stage were evaluated under each of 17 twolocus models. Our analyses identified a twolocus combination  rs6939589 and rs11634386  that proved to be significantly associated with rheumatoid arthritis under a Rec × Rec model (pvalue = 0.045 after adjusting for multiple tests and multiple models).
Background
Although the primary interest of genomewide association studies (GWAS) is to identify singlenucleotide polymorphisms (SNPs) with detectable marginal effects, it is also of interest to identify SNPs that have an interaction effect. Testing all possible pairs of loci to identify interactions creates many practical difficulties. In this article, we use a twostage strategy to search for twolocus joint effects, which effectively reduces the computation time. In the first stage, we only selected loci that met an initial threshold. In the second stage, any locus that met the firststage threshold was tested under each of 17 twolocus joint effect models. By applying this twostage analysis to Genetic Analysis Workshop 16 (GAW16) Problem 1, we successfully identified two SNPs that are jointly associated with rheumatoid arthritis (RA) and could not be identified by singlelocus analysis.
Methods
Twolocus analysis based on 17 twolocus models
where N is the sample size, is the average of X_{1}, ..., X_{ N }, and is the mean of y_{1}, ..., y_{ N }. Under null hypothesis of no association, T_{ score }asymptotically follows a chisquare distribution with one degree of freedom (df). Though we use a unified test statistic T_{ score }for all of the models, the 17 models correspond to 17 different tests. Different models correspond to different genotype coding, and thus different values of X_{ i }in T_{ score }. For example, under Rec ∩ Rec epistatic model, X = 1 if the twolocus genotype is AABB; X = 0 otherwise; under Rec× Rec multiplicative model, X = x_{1} + x_{2}, where x_{1} = 1 if the genotype at the first SNP is AA and x_{1} = 0 otherwise; x_{2} is similarly defined for the second SNP.
The method to search for significant twolocus combinations for each of the 17 models has the following two steps. In the first step, three onedf chisquare tests (corresponding to dominant, recessive, and additive models) are used to test for association at each SNP. For a SNP, let P denote the smallest pvalue of the three tests. We select M SNPs with the smallest P (M = 500 was used in this study). In the second step, under each of the 17 twolocus models, we apply a twolocus association test T_{ score }to each of the L twolocus combinations among the M selected SNPs, where L = M(M  1)/2. In this step, we get a pvalue (called raw pvalue) for each of the L twolocus combinations and each of the 17 twolocus models.
Let p_{ ij }denote the raw pvalue of the l^{th}twolocus combination under the j^{th} twolocus model. A permutation procedure is then used to adjust for multiple tests and multiple models. For each permutation, we randomly shuffle the case and control status. Based on the permuted data, we redo the singlemarker test for each SNP, select M SNPs with the smallest pvalues, and then test each of the twolocus combinations among the M selected SNPs under each of the 17 models. For the k^{th} permutation, let denote the pvalue of the twolocus test for the l^{th} twolocus combinations under the j^{th} model and let . Suppose that we perform the permutation K times (we use K = 1000 in this study). Then, the overall pvalue of the twolocus test for the l^{th} twolocus combination under the j^{th} model is given by . In this way, we can adjust for multiple testing for L twolocus tests and 17 models and we also can adjust for the fact that the M SNPs to do twolocus analysis were chosen as the "top" signals from a much larger set.
Control for population stratification
 1.
Genome Control (GC) method: the GC method [1, 2] rescales the chisquare test statistic. Suppose that we have N onedf chisquare tests (singlemarker test or twolocus test) with test statistics T_{1}, T_{2}, ..., T_{ N }. The GC method rescales T_{ i }as (i = 1, 2, ..., N), where b = median {T_{1}, T_{2}, ..., T_{ N }}/0.465 and considers S_{ i }following a chisquare distribution with one df.
 2.
Permutation GC method: Suppose that we have N onedf chisquare tests and rescale the test statistics as S_{1}, S_{2}, ..., S_{ N }by the GC method. Then, we carry out permutation K times. For each permutation, we perform N chisquare tests based on the permuted data and rescale the test statistics by GC method. For the k^{th} permutation, let denote the chisquare test statistics. We rescale them as by GC method. Then, the pvalue of the i^{th} test is given by .
To test whether the two methods can control for population stratification, we randomly chose 1000 SNPs from the data set of GAW16 Problem 1. The GC and permutation GC methods were used to test association for the 1000 SNPs. The histograms of the pvalues from the two methods are shown in Figure 2, b and c. These two panels show that the distributions of the pvalues are very similar to a uniform distribution, which indicates that both the GC and permutation GC methods can control for population stratification, at least for the data set of GAW16 Problem 1.
Results
The GAW16 Problem 1 data set contains genotypes at 531,690 SNPs on chromosomes 122 (868 cases and 1194 controls). We first performed singlemarker analysis. Three onedf chisquare tests (corresponding to dominant, recessive, and additive models) rescaled by GC method were applied to each SNP. The smallest pvalue of the three tests at each SNP was adjusted by Bonferroni correction for the three models and all the SNPs. From the singlemarker analysis, we found one SNP (rs2476601) on chromosome 1 and 183 SNPs on chromosome 6 with adjusted pvalues less than 0.05 after Bonferroni correction. The 183 SNPs that we found on chromosome 6 are in a region (29930240 to 33149012 bp) in high linkage disequilibrium with HLADRB1, a factor known to have a strong association with RA [3–6]. The SNP (rs2476601) on chromosome 1 is in the hematopoieticspecific protein tyrosine phosphatase gene, PTPN22, which has been identified to be associated with RA [6, 7].
We used the permutation GC method with 1000 permutations to carry out the twolocus analysis. In the analysis, we excluded SNPs within 10 Mbp of any of the SNPs surviving Bonferroni correction from the singlemarker analysis. Then, we performed the twostage, twolocus analysis among the remaining SNPs. For each pair of SNPs, individuals with missing genotypes at either locus were not included in that analysis. After adjusting for multiple tests and multiple SNPs, we found one twolocus combination  rs6939589 on chromosome 6 and rs11634386 on chromosome 15  significantly associated with RA. The twolocus combination followed the Rec× Rec multiplicative model with a pvalue of 0.045.
Discussion
Although there is growing appreciation that searching for epistatic interactions in humans may be a fruitful endeavor, there are still a number of practical difficulties associated with testing all possible pairwise comparisons in the case of GWAS, such as data storage requirements, computation time, and multiple testing. In this report we used a computationally efficient twostage analysis to search for joint effects of genes for GAW16 Problem 1. Applying this strategy to GAW16 Problem 1, we have successfully identified two loci that are significantly associated with RA. These two loci would not have been identified as significant by singlemarker analysis after correction for multiple tests.
Computational efficiency has imposed limits on our twostage strategy. Because our approach only searches twolocus interaction among the top M SNPs according to the marginal effects, the loci with strong interaction effects but weak marginal effects may not be detected. Furthermore, our approach only considers 17 interaction patterns (corresponding to the 17 twolocus models). The proposed twostage strategy may have low power to detect interactions whose patterns depart from the 17 twolocus models.
Conclusion
Our twolocus analysis showed that a twolocus combination, rs6939589 and rs11634386, is significantly associated with RA.
List of abbreviations used
 GAW16:

Genetic Analysis Workshop 16
 GC:

Genome control
 GWAS:

Genomewide association study
 RA:

Rheumatoid arthritis
 SNP:

Singlenucleotide polymorphism
Declarations
Acknowledgements
The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. This work was supported by NIH grants R01 GM069940 and the OverseasReturned Scholars Foundation of Department of Education of Heilongjiang Province (1152HZ01).
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/17536561/3?issue=S7.
Authors’ Affiliations
References
 Devlin B, Roeder K: Genomic control for association studies. Biometrics. 1999, 55: 9971004. 10.1111/j.0006341X.1999.00997.x.View ArticlePubMedGoogle Scholar
 Devlin B, Roeder K, Wasserman L: Genomic control, a new approach to geneticbased association studies. Theor Pop Biol. 2001, 60: 155166. 10.1006/tpbi.2001.1542.View ArticleGoogle Scholar
 du Montcel ST, Michou L, PetitTeixeira E, Osorio J, Lemaire I, Lasbleiz S, Pierlot C, Quillet P, Bardin T, Prum B, Cornelis F, ClergetDarpoux F: New classification of HLADRB1 alleles supports the shared epitope hypothesis of rheumatoid arthritis susceptibility. Arthritis Rheum. 2005, 52: 10631068. 10.1002/art.20989.View ArticlePubMedGoogle Scholar
 Stastny P: Mixed lymphocyte cultures in the rheumatoid arthritis. J Chin Invest. 1976, 57: 11481157. 10.1172/JCI108382.View ArticleGoogle Scholar
 Stastny P: Association of the Bcell alloantigen DRw4 with rheumatoid arthritis. N Engl J Med. 1978, 298: 869871.View ArticlePubMedGoogle Scholar
 Bowes J, Barton A: Recent advances in the genetics of RA susceptibility. Rheumatology. 2008, 47: 399402. 10.1093/rheumatology/ken005.View ArticlePubMedGoogle Scholar
 Carlton V, Hu X, Chokkalingam A: PTPN22 genetic variation: evidence for multiple variants associated with rheumatoid arthritis. Am J Hum Genet. 2005, 77: 567581. 10.1086/468189.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.