Evaluating epistatic interaction signals in complex traits using quantitative traits
© Mukherjee et al; licensee BioMed Central Ltd. 2009
Published: 15 December 2009
Skip to main content
Volume 3 Supplement 7
© Mukherjee et al; licensee BioMed Central Ltd. 2009
Published: 15 December 2009
Rheumatoid arthritis (RA) is a complex, chronic inflammatory disease implicated to have several plausible candidate loci; however, these may not account for all the genetic variations underlying RA. Common disorders are hypothesized to be highly complex with interaction among genes and other risk factors playing a major role in the disease process. This complexity is further magnified because such interactions may be with or without a strong independent effect and are thus difficult to detect using traditional statistical methodologies. The main challenge to analyze such gene × gene and gene × environment interaction is attributed to a phenomenon referred to as the "curse of dimensionality." Several combinatorial methodologies have been proposed to tackle this analytical challenge. Because quantitative traits underlie complex phenotypes and contain more information on the trait variation within genotypes than qualitative dichotomy, analyzing quantitative traits correlated with the affection status is a more powerful tool for mapping such trait genes. Recently, a generalized multifactor dimensionality reduction method was proposed that allows for adjustment for discrete and quantitative traits and can be used to analyze qualitative and quantitative phenotypes in a population based study design.
In this report, we evaluate the efficiency of the generalized multifactor dimensionality reduction statistical suite to decipher small interacting factors that contribute to RA disease pathogenesis.
Rheumatoid arthritis (RA) is a complex chronic inflammatory disease implicated to have several plausible candidate loci. Many genetic studies have been undertaken and only two genes, HLA-DRB1 and PTPN22, have been reported to be associated with disease [1–4]. Although these findings are encouraging, they may not account for all the genetic variations in RA because no direct pathogenic role of these molecules have been established in the development of the disease pathogenesis. Common disorders like RA are hypothesized to be highly complex, with interaction among genes and other risk factors playing a major role in the disease process. This complexity is further magnified because such interactions may be with or without strong independent main effect, and thus difficult to detect using traditional statistical methodologies . The main challenge to analyzing epistatic interactions is attributed to a phenomenon referred to as the "curse of dimensionality," which is a problem caused by the exponential increase in volume associated with adding extra dimensions to a mathematical space. Thus, while analyzing interactions among several loci for a complex phenotype, contingency tables in higher dimensions suffer from the problem of sparse data, leading to unreliable risk estimates. Several combinatorial methodologies have been proposed to overcome this analytical challenge: multifactor dimensionality reduction (MDR) ; combinatorial partitioning method (CPM)  and restricted partition method (RPM) . Although these methods have been used by several research groups, there exist some limitations in their current form: a) inability to adjust for covariates' MDR, b) inability to use quantitative phenotypes, and c) computationally intense algorithms.
Thus, there is a need to develop and evaluate more powerful statistical methodology so as to decipher small interacting factors that contribute to disease pathogenesis. Because quantitative traits underlie complex phenotypes and contain more information on the trait variation within genotypes than qualitative dichotomy, analyzing quantitative traits correlated with the affection status is a more powerful tool for mapping complex trait genes. Recently, a generalized MDR (GMDR) method was proposed that allows for adjustment for discrete and quantitative traits and can be used to analyze qualitative and quantative phenotypes in a population based study design .
In this report, we use the GMDR statistical suit to evaluate its efficiency to decipher small interacting factors that contribute to RA disease pathogenesis, using the two quantitative traits [anti-CCP (anti-cyclic citrullinated peptide) and RFUW (rheumatoid factor)] as covariates for classifying the data into high and low risk groups.
In the current study, we used the Genetic Analysis Workshop 16 (GAW16) RA case-control data set (Problem 1) comprising a total of 2062 sample (case = 868, control = 1194), typed on the 550 k Illumina chip. To evaluate the efficiency of the GMDR algorithm to detect small epistatic interactions involved in RA pathogenesis, analysis was performed on chromosomes 1, 2, 5, and 6, which have shown strong positive association earlier with the phenotype [1–4, 10]. Because quantitative trait information was available for only cases, interaction analysis using GMDR was performed on the RA cases (n = 867).
The GMDR scoring method: The GMDR method uses the original MDR data reduction method, with the ratio of cases to control being replaced by scores in each cell to discriminate between high risk and low risk followed by determining classification accuracy and prediction error. A detailed description of the methodology can be found elsewhere . This generalization of the original MDR algorithm a) allows increased flexibility to use covariates, b) is able to handle both dichotomous and continuous phenotypes, c) can be applied to a variety of population-based study designs (e.g., unbalanced case control samples.)
We formulated a detailed scoring methodology by using the expression S = exp(y)/1+exp(y), where y is the standardized quantitative trait. In brief, this was done by computing the mean and standard deviation (SD) of the quantitative trait. Scores where then assigned by subtracting the mean from the individual's quantitative trait value and then dividing it by the SD.
In the current study we used GMDR algorithm to evaluate its efficiency in detecting gene-gene interactions in the complex RA phenotype. For this we used markers information from the GAW16 data set from regions that have been previously implicated in RA. Additional file 1 lists the markers and their chromosomal position used in this analysis. All the markers selected were in HWE (data not shown). None of the regions selected showed extensive LD between the markers (Figure 1).
Summary of the best models obtained using GMDR algorithm for the quantitative trait RFUW (IgM)a
SNPs in best model
Sign test p-value
Rheumatoid factor (RFUW) has been widely used as a screening test for patients with RA. RFUW is prognostically useful because it correlates with functional and radiographic outcomes in RA . More recently, the anti-cyclic citrullinated peptide (anti-CCP) antibody has been developed, with a sensitivity of ~68% and specificity of 97% [13, 14]. Together, these clinical values serve as important indicators of the disease status and are routinely used in clinical setting to aid in diagnosis. Common disorders like RA are hypothesized to be highly complex, with interaction among genes and other risk factors playing a major role in the disease process. Powerful statistical methodology has been developed to overcome these challenges to decipher small epistatic interactions that are characteristic of such phenotypes. Because quantitative traits underlie complex phenotypes and contain more information on the trait variation within genotypes than qualitative dichotomy, we used the anti-CCP value and the RFUW values provided in the GAW16 Problem 1 data set to evaluate the recently developed GMDR algorithm to detect small interacting markers for RA disease status.
In this study we used the GMDR methodology to evaluate its efficiency to detect gene-gene interactions in putative regions for RA using the anti-CCP and RFUW (IgM) values as covariates. Three out of the four models predicted reached statistical significance (Table 1). None of the high-order interactions were between correlated markers, suggesting that there might be more than one signal in these genes. For this study we had used both the anti-CCP and the RFUW values to generate scores for the GMDR analysis. Scoring based on anti-CCP value did not result in significant interaction models. Our results show that RFUW values are better predictor of high-risk and low-risk classes and further strengthen the role of RFUW (IgM) antibody as a strong prognostic factor. Detailed biological characterization of this quantitative trait are warranted.
Anti-cyclic citrullinated peptide
Combinatorial partitioning method
Genetic Analysis Workshop 16
Generalized multifactor dimensionality reduction
Multifactor dimensionality reduction
Restricted partition method
The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medicine.
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/3?issue=S7.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.