Volume 3 Supplement 7
Genetic Analysis Workshop 16
Genomewide association study of rheumatoid arthritis by a score test based on wavelet transformation
 Renfang Jiang^{1},
 Jianping Dong^{1}Email author and
 Yilin Dai^{1}
DOI: 10.1186/175365613S7S8
© Jiang et al; licensee BioMed Central Ltd. 2009
Published: 15 December 2009
Abstract
Background
We have conducted a genomewide association study on the Genetic Analysis Workshop (GAW) 16 rheumatoid arthritis data using a multilocus score test based on wavelet transform proposed recently by the authors. The waveletbased test automatically adjusts for the amount of noise suppressed from the data. The power of the test is also increased by using the genetic information contained in the spatial ordering of singlenucleotide polymorphisms on a chromosome.
Results
After adjusting for the effect of population stratification, the test identified some previously discovered rheumatoid arthritis susceptibility loci (HLADRB1 and rs3761847) as well as some loci (rs2076530 and rs3130340) known to have association with sarcoidosis and bone mineral density. It was previously reported that patients with rheumatoid arthritis have elevated prevalence of sarcoidosis and have reduced bone mass.
Conclusion
This new test provides a useful tool in genomewide association studies.
Background
In genomewide association studies, the first choice of tests is usually a singlemarker test. If a singlenucleotide polymorphism (SNP) has a strong association with a disease, singlemarker tests should have higher power than multilocus tests. Multilocus tests can achieve higher power if several SNPs are associated with the disease. However, the potential high power of multilocus tests could be diminished as an increased number of markers results in an increased number of degrees of freedom. Therefore, reducing the number of degrees of freedom is essential to increasing the power of multilocus tests. Different strategies were introduced to reduce the number of degrees of freedom. Tests based on haplotype sharing (the longest continuous interval of matching alleles between haplotypes) effectively reduce the number of degrees of freedom [1, 2]. Another popular approach is to apply principal components analysis (PCReg) [3]. A score test based on the Fourier transform [4] is another attempt to reduce the degree of freedom, thereby increasing power. Instead of using the genotype data, weighted Fourier transform coefficients of the genotype data are used to form a score test.
Many multilocus association tests are not affected by permuting spatial order of SNPs; thus, they do not use the information contained in the ordering of SNPs. For example, the results of logistic regression will not change if the order of SNPs are permuted. The same is true for the test obtained by fitting a regression function with one SNP followed by Bonferroni correction to find the global pvalue. PCReg is also invariant under the permutation of SNPs. We provide a proof of the above claims as follows (See Wang and Abbott [3] for notations in the following proof.). Let G be a matrix of coded genotypes and suppose it is centered. The variancecovariance matrix of genotypes is A = G^{ T }G/(n  1) = VDV^{1}, where D is a diagonal matrix with eigenvalues of A as diagonal entries, and the columns of V are the eigenvectors. The regression model is y = GV_{1}b + ε, where V_{1} is the first several columns of V. After permutation of SNPs, the columns of G are also permuted, and G becomes GP, where P is a permutation matrix. Since (GP)^{ T }(GP) = (P^{ T }V) D(P^{ T }V)^{1}, the eigenvalues are not changed by the permutation, and the matrix of eigenvectors becomes P^{ T }V. The regression model remains unchanged. For the tests based on shared haplotype length, changing the ordering of SNPs could mean a shortened shared length, and the association between the disease and the SNPs could disappear. Therefore, the ordering of SNPs contains important genetic information, and ignoring it could lower the power of association tests.
We recently proposed a score test based on wavelet transform [5, 6], which is used in this report. This test has three advantages. First, it uses the wavelet transformation of genotype data instead of the genotype data. The wavelet transform is designed to handle unsmooth noisy signals. Genetic data are usually unsmooth and can be dealt with by the wavelet transform naturally. Second, it uses an empirical Bayes thresholding [7]. It was proved by Johnston and Silverman that this thresholding effectively suppresses noise from data [7]. Therefore, it increases the power of the test. The exact amount of noise being reduced depends on the specific problem. Third, because our method views multilocus genotypes of an individual as a discretized function, the spatial ordering of the SNPs on a chromosome is taken into consideration, which increases the power of the test.
Methods
There is a total of 2,062 individuals consisting of 868 cases and 1,194 controls in the North American Rheumatoid Arthritis Consortium (NARAC) data for Genetic Analysis Workshop (GAW) 16. These individuals were genotyped on the 550 k Illumina SNP chip. We analyzed 22 autosomal chromosomes in this report. SNPs satisfying one of the following criterion were excluded: missing genotype rate >0.05, or minor allele frequency < 0.05, or having pvalues < 0.00001 in the HardyWeinberg equilibrium test. About 12.8% of SNPs were removed. Missing genotypes were imputed by fastPHASE [8]. We adopted a strategy of a moving window of eight SNPs without overlapping (the first window contains SNPs 18, the next window contains SNPs 916, etc.) on the chromosome. We have tried windows of sizes 16 and 32 SNPs and there was not much difference. Overlapping windows were also tried and again, the results were similar. Computational burden increases when large windows or overlapping windows were used. The k^{th} window is from the (8(k 1)+ 1)^{th} SNP to the 8k^{th} SNPs. Let G_{ k }be the corresponding eight columns of the genotype matrix G. Different ways of coding SNPs do make a difference. The SNPs in G_{ k }were recoded to maximize the number of positive pairwise correlations. This removes any ambiguity in the coding of genotypes. In our simulation studies, the recoding increased the power of the test while keeping the type I error rate in check.
We applied EIGENSTRAT [9] to remove the effects of population stratification. The top three eigenvectors obtained by EIGENSTRAT were used to adjust genotypes and phenotypes as suggested by Price et al. [9]. Let G be the genotype matrix, C be the matrix of the principal component vectors, and Y be the vector of phenotypes. The adjusted genotype matrix is = G  CC^{ T }G and the adjusted phenotype vector is = Y  CC^{ T }Y. Before the adjustment, the inflation factor [10] λ_{GC} = 1.399, which reduced to 1.025 after the adjustment. To avoid a possible bias, all chromosomes except 6 and 8 were included to produce the eigenvectors, which were used to adjust genotypes on chromosomes other than 6. Chromosome 6 was used to produce eigenvectors to adjust genotypes on chromosome 6.
Consider the k^{th} window and the corresponding matrix of adjusted genotype . Applying wavelet transform, thresholding, and inverse wavelet transform consecutively on the i^{th} row of resulted in the modified genotype (x_{i1}, x_{i2}, ..., x_{ im }) of the i^{th} individual. Subtract the mean of the j^{th} column of X = (x_{ ij }) from x_{ ij }such that the mean of each column of X = 0. Let = (y_{1}, y_{2}, ..., y_{ n }) be the adjusted phenotype of n individuals. Assume a generalized linear model [11] f(E(y_{ i })) = α + X_{ i }β between X and Y, where f is a link function. Let be the score statistic for the markers. The variance of U_{ j }under the null hypothesis can be estimated by The global score statistic of this window is
The global pvalues of the waveletbased test were obtained as follows. We randomly permuted the phenotypes (casecontrol status) 5,000 times, and received 5,000 sets of permuted phenotypes. Adjust each set with the principal component vectors, and they are still called permuted phenotypes. At each window of eight SNPs, the value of T was calculated 5,001 times, using the adjusted phenotypes and 5,000 sets of permuted phenotypes, respectively. In M windows along the chromosomes, we have 5000 M values of T with permuted phenotypes, denoted by a 5,000 × M matrix (T_{ ij }), where T_{ ij }is the absolute value of T at the j^{th} window using the i^{th} set of permuted phenotypes. Let m_{ i }= max_{ j }T_{ ij }be the maximum absolute value of T at M windows using the i^{th} set of permuted phenotypes. Let T_{ j }be the absolute value of T at the j^{th} window using the adjusted phenotypes. The global pvalue of the waveletbased test at the j^{th} window is the proportion of m_{ i }> T_{ j }: global pvalue of T_{ j }= #{m_{ i }m_{ i }> T_{ j }}/5,000.
Results and discussion
After correcting for population stratification, significant signals were only found on chromosomes 6 and 9. Four windows on chromosome 6 attracted our attention. The first window (rs9268005, rs3130340, rs3115553, rs9268132, rs926070, rs6935269, rs7775397, rs17422797) contains rs3130340, which was identified to have association with bone mineral density and fractures [12]. It has been reported that a large proportion of men with rheumatoid arthritis (RA) had reduced bone mass [13]. The pvalue at this window is 0.0074. The second window (rs4424066, rs12529049, rs3117099, rs3117098, rs3817973, rs1980493, rs2076530, rs4248166) contains rs2076530, which is associated with sarcoidosis [14]. Increased prevalence of sarcoidosis among RA patients has been reported [15]. The association between sarcoidosis, RA, and rs2076530 is an interesting phenomenon. The association between rs2076530 and RA was investigated before, but whether it is a causal SNP for RA or its effect is merely a carryover effect of nearby haplotypes [16] merits further investigation. The third window (rs2395182, rs3129890, rs9268832, rs6903608, rs2395185, rs477515, rs2516049, rs2858870) contains HLADRB1, which has long been identified as a major genetic risk contributor to RA [17]. The last window (rs9275224, rs5000634, rs6457617, rs2647012, rs9357152, rs10484561, rs9275313, rs1794282) contains rs6457617, which has also been reported as being associated with RA [18]. The pvalues of the test at the above three windows are less than 0.0002 (it is 0 after 5,000 permutations). The most significant window on chromosome 9 (rs1953126, rs10985073, rs3761847, rs10985095, rs10985097, rs2900180, rs12235400, rs10985112) which contains rs3761847 and it was identified as a risk locus for RA [19], has a pvalue 0.53.
Conclusion
A waveletbased multilocus score test was applied in a genomewide association study on RA data followed by fine mapping of regions identified in our genomewide association study. Several statistically significant risk loci for RA were identified after adjustment for population stratification. Some windows contain genes and/or SNPs (HLADRB1, rs3761847, and rs6457617) previously associated with RA. Some windows contain SNPs (rs2076530, rs3130340) previously associated with sarcoidosis [14] or bone mineral density and fractures [12], all of which have been reported as being related with RA [13, 15].
List of abbreviations used
 NARAC:

North American Rheumatoid Arthritis Consortium
 PCReg:

Principal components analysis
 RA:

Rheumatoid arthritis
 SNP:

Singlenucleotide polymorphism
Declarations
Acknowledgements
This research was partially supported by a National Institutes of Health grant GM06994001A2. The authors thank the reviewers for their helpful suggestions which greatly improved the paper.
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/17536561/3?issue=S7.
Authors’ Affiliations
References
 Tzeng JY, Devlin B, Roeder K, Wasserman I: On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. Am J Hum Genet. 2003, 72: 891902. 10.1086/373881.PubMed CentralView ArticlePubMedGoogle Scholar
 Zhang S, Sha Q, Chen HS, Dong J, Jiang R: Transmission/disequilibrium test based on haplotype sharing for tightly linked markers. Am J Hum Genet. 2003, 73: 566579. 10.1086/378205.PubMed CentralView ArticlePubMedGoogle Scholar
 Wang K, Abbott D: A principal components regression approach to multilocus genetic association studies. Genet Epidemiol. 2008, 32: 108118. 10.1002/gepi.20266.View ArticlePubMedGoogle Scholar
 Wang T, Elston RC: Improved power by use of a weighted score test for linkage disequilibrium mapping. Am J Hum Genet. 2007, 80: 353360. 10.1086/511312.PubMed CentralView ArticlePubMedGoogle Scholar
 Jiang R, Dong J, Dai Y: A wavelet based method in association [abstract 96]. Genet Epidemiol. 2008, 32: 697Google Scholar
 Improving Power in Geneticassociation Studies Via Wavelet Transformation. BMC Genetics. 2009, 10: 5310.1186/147121561053.
 Johnston IM, Silverman BW: Empirical Bayes selection of wavelet thresholds. Ann Stat. 2005, 33: 17001752. 10.1214/009053605000000345.View ArticleGoogle Scholar
 Scheet P, Stephens M: A fast and flexible statistical model for largescale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006, 78: 629644. 10.1086/502802.PubMed CentralView ArticlePubMedGoogle Scholar
 Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genomewide association studies. Nat Genet. 2006, 38: 904909. 10.1038/ng1847.View ArticlePubMedGoogle Scholar
 Devlin B, Roeder K: Genomic control for association studies. Biometrics. 1999, 55: 9971004. 10.1111/j.0006341X.1999.00997.x.View ArticlePubMedGoogle Scholar
 Schaid DJ, Rowland CM, Tines DE, Jacbson RM, Poland GA: Score tests for association between traits and haplotypes when linkage is ambiguous. Am J Hum Genet. 2002, 70: 425434. 10.1086/338688.PubMed CentralView ArticlePubMedGoogle Scholar
 Styrkarsdottir U, Halldorsson BV, Gretarsdottir S, Gudbjartsson DF, Walters GB, Ingvarsson T, Jonsdottir T, Saemundsdottir J, Center JR, Nguyen TV, Bagger Y, Gulcher JR, Eisman JA, Christiansen C, Sigurdsson G, Kong A, Thorsteinsdottir U, Stefansson K: Multiple genetic loci for bone mineral density and fractures. N Engl J Med. 2008, 358: 23552365. 10.1056/NEJMoa0801197.View ArticlePubMedGoogle Scholar
 Tengstrand B, Hafstrom I: Bone mineral density in men with rheumatoid arthritis is associated with erosive disease and sulfasalazine treatment but not with sex hormones. J Rheumatol. 2002, 29: 22992305.PubMedGoogle Scholar
 Valentonyte R, Hampe J, Huse K, Rosenstiel P, Albrecht M, Stenzel A, Nagy M, Gaede KI, Franke A, Haesler R, Koch A, Lengauer T, Seegert D, Reiling N, Ehlers S, Schwinger E, Platzer M, Krawczak M, MullerQuernheim J, Schurmann M, Schreiber S: Sarcoidosis is associated with a truncating spliice site mutation in BTNL2. Nat Genet. 2005, 37: 357364. 10.1038/ng1519.View ArticlePubMedGoogle Scholar
 Kucera RF: A possible association of rheumatoid arthritis and sarcoidosis. Chest. 1989, 95: 604606. 10.1378/chest.95.3.604.View ArticlePubMedGoogle Scholar
 Orozco G, Eerligh P, Sánchez E, Zhernakova S, Roep BO, GonzálezGay MA, LópezNevot MA, Callejas JL, Hidalgo C, PascualSalcedo D, Balsa A, GonzálezEscribano MF, Koeleman BP, Martín J: Analysis of a functional BTNL2 polymorphism in type 1 diabetes, rheumatoid arthritis, and systemic lupus erythematosus. Hum Immunol. 2005, 66: 12351241. 10.1016/j.humimm.2006.02.003.View ArticlePubMedGoogle Scholar
 Stastny P: Association of the Bcell alloantigen DRw4 with rheumatoid arthritis. N Engl J Med. 1978, 298: 869871.View ArticlePubMedGoogle Scholar
 Julià A, Ballina J, Cañete JD, Balsa A, TorneroMolina J, Naranjo A, AlperiLópez M, Erra A, PascualSalcedo D, Barceló P, Camps J, Marsal S: Genomewide association study rheumatoid arthritis in the Spanish population: KLF12 as a risk locus for rheumatoid arthritis susceptibility. Arthritis Rheum. 2008, 58: 22752286. 10.1002/art.23623.View ArticlePubMedGoogle Scholar
 Plenge RM, Seielstad M, Padyukov L, Lee AT, Remmers EF, Ding B, Liew A, Khalili H, Chandrasekaran A, Davies LR, Li W, Tan AK, Bonnard C, Ong RT, Thalamuthu A, Pettersson S, Liu C, Tian C, Chen WV, Carulli JP, Beckman EM, Altshuler D, Alfredsson L, Criswell LA, Amos CI, Seldin MF, Kastner DL, Klareskog L, Gregersen PK: TRAF1C5 as a risk locus for rheumatoid arthritisa genomewide study. N Engl J Med. 2007, 357: 11991209. 10.1056/NEJMoa073491.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.