Volume 3 Supplement 7
Different models and single-nucleotide polymorphisms signal the simulated weak gene-gene interaction for a quantitative trait using haplotype-based and mixed models testing
© Kovac and Dubé; licensee BioMed Central Ltd. 2009
Published: 15 December 2009
Knowledge of simulated genetic effects facilitates interpretation of methodological studies. Genetic interactions for common disorders are likely numerous and weak. Using the 200 replicates of the Genetic Analysis Workshop 16 (GAW16) Problem 3 simulated data, we compared the statistical power to detect weak gene-gene interactions using a haplotype-based test in the UNPHASED software with genotypic mixed model (GMM) and additive mixed model (AMM) mixed linear regression model in SAS. We assumed a candidate-gene approach where a single-nucleotide polymorphism (SNP) in one gene is fixed and multiple SNPs are at the second gene. We analyzed the quantitative low-density lipoprotein trait (heritability 0.7%), modulated by simulated interaction of rs4648068 from 4q24 and another gene on 8p22, where we analyzed seven SNPs. We generally observed low power calculated per SNP (≤ 37% at the 0.05 level), with the haplotype-based test being inferior. Over all tests, the haplotype-based test performed within chance, while GMM and AMM had low power (~10%). The haplotype-based and mixed models detected signals at different SNPs. The haplotype-based test detected a signal in 50 unique replicates; GMM and AMM featured both shared and distinct SNPs and replicates (65 replicates shared, 41 GMM, 27 AMM). Overall, the statistical signal for the weak gene-gene interaction appears sensitive to the sample structure of the replicates. We conclude that using more than one statistical approach may increase power to detect such signals in studies with limited number of loci such as replications. There were no results significant at the conservative 10-7 genome-wide level.
With efforts to uncover more genetic variation for common polygenic disorders, there is accrued interest in the analysis of genetic interactions . It is very plausible to expect a plurality of statistical genetic interactions with each having a small effect, rather than a few strong interactions. Here, we compare the statistical power of several methods to detect a weak gene-gene interaction for a quantitative trait in a combined data set that includes both familial and unrelated samples, in a hypothetical candidate-gene design in which the single-nucleotide polymorphism (SNP) in one gene is fixed (for example, as replicated by marginal association analysis and/or for a known functional factor) and multiple SNPs are at the second interacting candidate gene. We used the Genetic Workshop Analysis 16 (GAW16) Problem 3 simulated data set  with knowledge of the modeling scheme. Using 200 replicates, we compared the power of the haplotype-based test in the computer program UNPHASED  to that of genotypic and additive mixed models (GMM and AMM) as implemented with a mixed linear regression model in SAS.
Subjects and phenotype
The subjects consist of 6,476 related and unrelated individuals from the Framingham Heart Study as made available for the GAW16 Problem 3 simulated data by dbGAP. We used two phenotype definitions in separate analyses. A first quantitative phenotype that was used for haplotype-based analyses only was the covariate-adjusted multivariate linear regression residual of the low-density lipoprotein (LDL) variable, averaged over the three visits and standardized. The covariates used for the adjustment were sex, age, diet, medication use, high-density lipoprotein (HDL), triglyceride level, and smoking. The main quantitative phenotype used for both haplotype-based and mixed model analyses was the LDL measurement at the first visit, regression-adjusted for age and sex only (excluding subjects on lipid-lowering medication). The simulated LDL measurement at the first visit did not show notable departure from the normal distribution in the first replica, suggesting that non-normality would not be an important factor in this study.
Haplotype-based analysis with UNPHASED
The UNPHASED software implements a likelihood approach for primarily haplotype-based analysis of data, which can include both familial and unrelated subjects . The test for gene-gene interaction for a quantitative trait that we used compares the null hypothesis (H0) of equal contributions for all gene combinations (in haplotype form) sharing the same alleles at the conditioning marker, versus the alternative hypothesis (H1) of differential multiplicative contributions from the test marker. The test uses a likelihood-ratio chi-square statistics to compare models with and without the interaction terms. The frequency of gene combinations was limited to 0.01. Gene-gene interaction using SNP-SNP testing (in haplotype form) was evaluated between rs4648068 and each SNP in the LPL gene region one at a time. Gene-gene interaction was also evaluated using SNP-haplotype testing between rs4648068 and each consecutive pair of SNPs in the LPL gene region one pair at a time as haplotypes, e.g., rs263-rs271, rs271-rs11570892, etc. The SNP-haplotype testing was conducted only in the analysis of the first phenotype.
Mixed model analysis with SAS
where LDL is measured at the first visit excluding subjects on lipid medication, SNP0 is the fixed rs4648068, SNP i is from the LPL gene region (i = 1-7), and pedno is the pedigree ID included as a random effect to control for familial dependence. For the GMM test, SNPs were represented by three genotype classes (two homozygotes and a heterozygote). For the AMM test, the number of minor alleles in a genotype was coded as 0, 1, or 2. We used Type 3 tests of fixed effects to evaluate statistical significance.
Haplotype-based SNP-SNP versus SNP-haplotype testing
In our first haplotype-based analyses of 200 replicates with the first LDL definition, there was low power (≤ 15%) at the 0.05 significance level with markers rs3200218 and rs2898493 in either SNP-SNP tests or SNP-haplotype tests involving these markers (not shown). The SNP-SNP testing was more powerful than SNP-haplotype testing (maximal power 15% at rs3200218 vs. 9.5% at haplotype rs11570892-rs3200218).
Haplotype-based SNP-SNP testing versus genotypic and additive mixed model testing
Statistical power (%) for haplotype-based and mixed model analyses of the Covariate-adjusted LDL measurement in 200 replicates
rs4648068 × SNP
rs4648068 × SNPc
Additive genotype recoding
Additive genotype recoding
α = 5%
Signal identification by replicates with different models
We generally observed low statistical power to detect a weak simulated candidate gene-gene interaction for the LDL trait with heritability of 0.7%, where one SNP was fixed and multiple SNPs are at the second gene. There was a small increase in power per SNP with the haplotype-based tests when LDL was not adjusted for HDL and triglyceride level, potentially retaining useful variation (the LPL candidate gene is simulated to contribute to HDL with a heritability of 0.003). In further analyses the haplotype-based test was inferior to GMM or AMM, particularly when power was calculated over all SNP tests.
The haplotype-based test detected gene-gene interaction with a single SNP in the region, while GMM and AMM had shared and specific SNPs. We also examined the relationship between the specific methods used and the replicates that produced any gene-gene interaction signals. While there was some overlap in successful replicates with the GMM and AMM models, each of the three models identified distinct replicates, for a total of 183/200 replicates producing a detectable gene-gene interaction signal. Notably, the haplotype and mixed model tests did not share a single successful replica. The sole SNP identified with low power in the haplotype-based interaction test was also the only one with a robust marginal effect in both the GMM and AMM (without the interaction signal). Accordingly, the interaction signal from this particular SNP could be obscured in the mixed models by the strong marginal effect. The inability of the haplotype-based test to detect the weak interaction at other SNPs compared with mixed models suggests overall inferiority of the former approach for this simulated genetic interaction. In this data set, the weak gene-gene interaction followed different genetic patterns in different replicates, and its detection was sensitive to the particular structure of the replicates.
The systematic use of more than one analytical approach may help to increase power to identify a weak gene-gene interaction in a limited number of test loci such as in a replication study. The replication sample may not necessarily present an identical genetic pattern to that of the original study. It might be expected that sensitivity to sample structure would be less of an issue for stronger gene-gene interactions. Factors beyond statistical interaction patterns, such as the presence of marginal genetic effects, may decrease the power to detect genetic interaction in mixed models. Finally, we note that not a single result from any analysis was significant at the conservative 10-7 genome-wide level for a 500 k chip, indicating that these or similarly powered approaches would not detect such a weak gene-gene interaction from a genome-wide scan.
List of abbreviations used
Additive mixed model
Genetic Analysis Workshop 16
Genotypic mixed model
Lipoprotein lipase precursor
The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences.
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/3?issue=S7.
- Musani SK, Shriner D, Liu N, Feng R, Coffey CS, Yi N, Tiwari HK, Allison DB: Detection of gene × gene interactions in genome-wide association studies of human population data. Hum Hered. 2007, 63: 67-84. 10.1159/000099179.View ArticlePubMedGoogle Scholar
- Kraja AT, Culverhouse R, Daw EW, Wu J, van Brunt A, Province MA, Borecki IB: Data Description. Problem 3: Several phenotypic longitudinal data simulated in association with the genome-wide single-nucleotide polymorphisms from Framginham. BMC Proc. 2008, 3 (suppl 7): S4-10.1186/1753-6561-3-s7-s4.View ArticleGoogle Scholar
- Dudbridge F: Likelihood-based association analysis for nuclear families and unrelated subjects with missing genotype data. Hum Hered. 2008, 66: 87-98. 10.1159/000119108.PubMed CentralView ArticlePubMedGoogle Scholar
- Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005, 21: 263-265. 10.1093/bioinformatics/bth457.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.