Genome-wide analysis of single-locus and epistasis single-nucleotide polymorphism effects on anti-cyclic citrullinated peptide as a measure of rheumatoid arthritis
© Ma et al; licensee BioMed Central Ltd. 2007
Published: 18 December 2007
The goal of this study was to identify single-locus and epistasis effects of SNP markers on anti-cyclic citrullinated peptide (anti-CCP) that is associated with rheumatoid arthritis, using the North American Rheumatoid Arthritis Consortium data. A square root transformation of the phenotypic values of anti-CCP with sex, smoking status, and a selected subset of 20 single-nucleotide polymorphism (SNP) markers in the model achieved residual normality (p > 0.05). Three single-locus effects of two SNPs were significant (p < 10-4). The epistasis analysis tested five effects of each pair of SNPs, the two-locus interaction, additive × additive, additive × dominance, dominance × additive, and dominance × dominance effects. A total of ten epistasis effects of eight pairs of SNPs on 11 autosomes and the X chromosome had significant epistasis effects (p < 10-7). Three of these epistasis effects reached significance levels of p < 10-8, p < 10-9, and p < 10-10, respectively. Two potential SNP epistasis networks were identified. The results indicate that the genetic factors underlying anti-CCP may include single-gene action and gene interactions and that the gene-interaction mechanism underlying anti-CCP could be a complex mechanism involving pairwise epistasis effects and multiple SNPs.
The data set of the North American Rheumatoid Arthritis Consortium (NARAC) for Genetic Analysis Workshop 15 (GAW15) contains genotypes of 5700 SNPs covering all 23 human chromosomes, affected status of rheumatoid arthritis (RA), and a number of quantitative traits including anti-cyclic citrullinated peptide (anti-CCP). Anti-CCP is associated with RA and is used by some as a measure of RA . Linkage analysis of the RA status in the NARAC data using affected sib-pair method has been reported .
The NARAC data set was edited by requiring each individual to have SNP genotypes on the 5700 SNPs and anti-CCP record, and 1466 individuals satisfied this criterion. The anti-CCP values significantly deviated from normal distribution with p < 0.01 (Fig. 1A), according to the Shapiro-Wilk, Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling tests offered by SAS UNIVARIATE PROCEDURE . Because the residual normal distribution, not the phenotypic normal distribution, is required for the statistical tests in this article, a statistical model that achieves residual normality was found using the procedure leading to Figures 1B–1F. The untransformed anti-CCP did not achieve residual normality (p < 0.01; Fig. 1B–1C). The Box-Cox transformation  for a range of λ values and the square root transformation of anti-CCP were evaluated to find an optimal transformation that has the minimal sum of squares under the model for Figure 1B and improves the residual normality. None of the transformations achieved residual normality (p < 0.01), but the square root transformation was found to have minimal residual sum of squares. The residual distribution under this transformation is shown in Figure 1D. For the model used in Figure 1D, a total of 41 SNPs were found to have significant single-locus and epistasis effects with the same significance level as the SNPs for Figure 1C. Adding all the 41 SNPs in the model for Figure 1D achieved a near-perfect residual normality (p > 0.15; Fig. 1E). To reduce the model degrees of freedom or increase the residual degrees of freedom, step-wise elimination of SNPs from the full model for Figure 1E was conducted to find the minimal set of SNPs that had 20 SNPs (Table 1) and still achieved residual normality for the transformed data (p > 0.05; Fig. 1F). In the model for Figure 1F, each SNP with a single-locus effect was fitted in model as a locus with three genotypes while each pair of SNPs were fitted in the model as a genetic factor with nine (3 × 3) genotypes. Each SNP or SNP pair in this subset was re-tested by treating the other SNPs in the set as fixed effects (in addition to sex and smoking status). For all SNPs not in this subset, the SNP effects were tested based on the model for Figure 1F. The model for testing single-locus effects was
Minimal set of SNPs to achieve residual normality of the square root transformed anti-CCP values in Figure 1F
where (y)1/2 is the square root transformed anti-CCP, sex is the gender of the individual, CigE is the indicator variable whether the person ever smoked, (normality SNPs) is the 20 SNPs in Table 1 to achieve residual normality shown in Figure 1F, SNP is the SNP being tested for three single-locus effects (the SNP marker effect, and additive and dominance effects), and e is the random residual. The significance test of the SNP marker effect used an F-test, and t-tests were used to test additive and dominance effects. The epistasis analysis tested five effects of each pair of SNPs: two-locus interaction (I), additive × additive (A × A), additive × dominance (A × D), dominance × additive (D × A), and dominance × dominance (D × D) epistasis effects. The genetic interpretation of the A × A, A × D, D × A, and D × D epistasis effects are allele × allele, allele × genotype, genotype × allele, and genotype × genotype interactions, respectively. The model for testing epistasis effects was
(y)1/2 = sex + CigE + (normality SNPs) + SNP1 + SNP2 + SNP1*SNP2 + e, (2)
where SNP1*SNP2 is the interaction effect between the two SNPs, to be denoted by "I". The significance test of the I-effect used an F-test, and t-tests were used to test four individual epistasis effects, A × A, A × D, D × A, and D × D, using an extended Kempthorne model that allows Hardy-Weinberg and linkage disequilibria . The I-effect answers the question whether the two loci had an interaction whereas an individual epistasis effect identifies the exact mode of the interaction. For testing epistasis effects involving the X chromosome, only females were included in the analysis. The single-locus and epistasis tests using Models (1–2) were implemented using the epiSNP computer package developed by the authors .
Significant single-locus SNP effects
Significant single-locus SNP effects on square root transformed anti-CCP with p < 10-4
Chromosome location (bp)
0.525 × 10-4
0.403 × 10-4
0.628 × 10-4
Significant epistasis effects
Significant SNP epistasis effects on square root transformed anti-CCP with p < 10-7
A × A
0.827 × 10-7
D × A
0.976 × 10-7
0.563 × 10-7
D × A
0.765 × 10-7
A × A
0.754 × 10-7
0.583 × 10-7
D × A
0.583 × 10-8
0.612 × 10-9
A × A
0.247 × 10-10
D × D
0.437 × 10-7
Complex gene interaction mechanism and epistasis network
The results showed that evidence for epistasis effects was stronger than for single-locus effects on anti-CCP. This implies that gene interactions could be an important genetic factor underlying rheumatoid arthritis. The phenotypic distribution of anti-CCP might merit further study to identify factors that caused the distribution curve with multiple peaks as shown in Figure 1A. Including such factors in the statistical model could achieve residual normality without using as many 'normality SNPs' and hence increases the residual degrees of freedom.
The genetic factors underlying anti-CCP may include single-gene action and gene interactions but gene interaction effects could be more important than single-gene effects. The gene interaction mechanism could be a complex mechanism involving a number of SNPs and three types of pairwise epistasis effects, and the epistasis results could be used to identify allelic and genotypic combinations with the highest and lowest anti-CCP levels.
This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.
- Firestein GS: Evolving concepts of rheumatoid arthritis. Nature. 2003, 423: 356-361. 10.1038/nature01661.View ArticlePubMedGoogle Scholar
- Amos CI, Chen WV, Lee A, Li W, Kern M, Lundsten R, Batliwalla F, Wener M, Remmers E, Kastner DA, Criswell LA, Seldin MF, Gregersen PK: High-density SNP analysis of 642 Caucasian families with rheumatoid arthritis identifies two new linkage regions on 11p12 and 2q33. Genes Immunol. 2006, 7: 277-286. 10.1038/sj.gene.6364295.View ArticleGoogle Scholar
- SAS Institute Inc: SAS User's Guide. 1990, Cary, North Carolina: SAS Institute, IncGoogle Scholar
- Box GEP, Cox DR: An analysis of transformations. J Roy Stat Soc Ser B. 1964, 26: 211-246.Google Scholar
- Mao Y, London NR, Ma L, Dvorkin D, Da Y: Detection of SNP epistasis effects of quantitative traits using an extended Kempthorne model. Physiol Genomics. 2006, 28: 46-52. 10.1152/physiolgenomics.00096.2006.View ArticlePubMedGoogle Scholar
- Ma L, Dvorkin D, Garbe JR, Da Y: epiSNP User Manual Version 1.1w. Department of Animal Science, University of Minnesota, [http://animalgene.umn.edu/episnp/index.html]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.