Genetic signal maximization using environmental regression
© Melton et al; licensee BioMed Central Ltd. 2011
Published: 29 November 2011
Joint analyses of correlated phenotypes in genetic epidemiology studies are common. However, these analyses primarily focus on genetic correlation between traits and do not take into account environmental correlation. We describe a method that optimizes the genetic signal by accounting for stochastic environmental noise through joint analysis of a discrete trait and a correlated quantitative marker. We conducted bivariate analyses where heritability and the environmental correlation between the discrete and quantitative traits were calculated using Genetic Analysis Workshop 17 (GAW17) family data. The resulting inverse value of the environmental correlation between these traits was then used to determine a new β coefficient for each quantitative trait and was constrained in a univariate model. We conducted genetic association tests on 7,087 nonsynonymous SNPs in three GAW17 family replicates for Affected status with the β coefficient fixed for three quantitative phenotypes and compared these to an association model where the β coefficient was allowed to vary. Bivariate environmental correlations were 0.64 (± 0.09) for Q1, 0.798 (± 0.076) for Q2, and −0.169 (± 0.18) for Q4. Heritability of Affected status improved in each univariate model where a constrained β coefficient was used to account for stochastic environmental effects. No genome-wide significant associations were identified for either method but we demonstrated that constraining β for covariates slightly improved the genetic signal for Affected status. This environmental regression approach allows for increased heritability when the β coefficient for a highly correlated quantitative covariate is constrained and increases the genetic signal for the discrete trait.
The current availability of groups of correlated phenotypes for several common complex chronic diseases can aid in the study of these traits by providing data beyond what is contained in the phenotypes individually. Several earlier statistical genetic studies have shown that when correlations between phenotypes are explicitly modeled, they provide greater power than that provided by univariate analyses of individual traits [1–6]. Joint analysis of traits can also improve the detection of quantitative trait loci, where the effect sizes are too small to be found in single-trait analyses, and it can also inform the investigation of pleiotropy and co-incident linkage. A commonly encountered situation, in which the potential benefits of bivariate analysis are appreciated, is that of a discrete disease trait and a correlated quantitative phenotype. A number of multifactorial diseases, for example, diabetes or hypertension, studied as discrete traits, are also highly correlated with quantitative traits, physiological risk factors, or other continuously distributed biological characteristics. Although quantitative traits may not be used explicitly in the definition of disease status, both classes of information are useful and mutually supportive. However, these analyses primarily focus on the genetic correlation between the traits and do not take into account how the environmental correlation between the phenotypes may be used. This is important because often in epidemiological genetic studies of complex phenotypes, quantitative traits that are significantly correlated with the disease phenotype are included as covariates in the analysis, and not accounting for the environmental correlation leads to a less than optimal genetic signal.
Toward this end, we present a novel statistical genetics regression method that accounts for a portion of the environmental component using the Genetic Analysis Workshop 17 (GAW17) 1000 Genomes Project simulated family data between the discrete trait (Affected) and quantitative phenotypes Q1, Q2, and Q4 over all 200 replicates. We conducted bivariate polygenic analyses of the discrete trait with each of the three quantitative traits. We then used the resulting data from these analyses to calculate a new β coefficient for the quantitative trait, which was constrained to this value in a univariate analysis of affection status with the quantitative phenotype included as a covariate. Next, we compared the resulting model to a univariate analysis of affection status in which the quantitative trait was included as an unmodified covariate. Finally, we conducted genetic association tests using measured genotype analysis for affection status with each quantitative phenotype included as a covariate in two models: (1) a model in which the β coefficient was allowed to fluctuate and (2) a model in which the β coefficient was constrained. We used only the first three replicates of the GAW17 family data set to determine whether this method maximized the genetic signal for the true known genetic variants.
The GAW17 family data set contains 697 individuals divided into 194 nuclear families in 8 pedigrees with 202 founders from the 1000 Genomes Project. These family data include 13,875 autosomal single-nucleotide polymorphisms (SNPs) from 3,205 genes, with 7,087 of these genotyped SNPs in 1,890 genes being nonsynonymous. Each of the 200 simulated data sets includes the following information for each individual: affection status, three continuous quantitative traits (Q1, Q2, and Q4), age, smoking status, and sex . These analyses were done with knowledge of the GAW17 answers.
Analytical methods: environmental regression and association analysis
where β e is the constrained β coefficient, ρ g is the genetic correlation between traits, h Q is the square root of , and h D is the square root of . We then compared the resulting heritability to a univariate polygenic model in which the quantitative trait was included as a covariate and the β coefficient was allowed to fluctuate.
For the association analysis we conducted a measured genotype analysis on each of the 7,087 nonsynonymous SNPs to calculate a nominal p-value for association using Affected status as the trait and Age, Sex, and Smoking as the covariates. To control for potential population stratification, we performed principal components analysis on genotype scores for 6,178 polymorphic synonymous SNPs in the 202 founders using the prcomp routine available in the R statistical package (http://www.r-project.org) . We included the first four principal components (PC1–PC4) as covariates.
where “corrected” is the corrected p-value, “nominal” is the uncorrected p-value, and “effective” is the effective number of SNPs. This approach allows for nonindependence among family members and accounts for effects of other potential covariates .
Results and discussion
Bivariate models used for calculating the β coefficient for environmental regression
Heritability (Affected) (SD)
Heritability (quantitative trait) (SD)
Affected * Q1
Affected * Q2
Affected * Q4
We conducted the genetic association analysis on affection status adjusting for all three quantitative phenotypes in the first three replicates of the data set. We compared a measured genotype association analysis of affection status in which the β coefficient for each quantitative phenotype was allowed to fluctuate to a measured genotype analysis in which the β coefficient was constrained to a fixed value in the first three GAW17 replicates. Of the 162 “true” genetic variants for the four GAW17 phenotypes, 58 (12 for Affected, 17 for Q1, 29 for Q2, and none for Q4) were found in these family data.
Differences (Δ) in heritability and Χ2 for the first – replicates in the GAW17 data
ΔQ1 chi-square, all (n = 7,087)
ΔQ1 chi-square, true (n = 58)
ΔQ1 chi-square, Affected (n = 12)
ΔQ2 chi-square, all (n = 7,087)
ΔQ2 chi-square, true (n = 58)
ΔQ2 chi-square, Affected (n = 12)
ΔQ4 chi-square, all (n = 7,087)
ΔQ4 chi-square, true (n = 58)
ΔQ4 chi-square, Affected (n = 12)
As next-generation sequencing data become more available, an important consideration will be to maximize the ability to detect rare variants that have a large effect on chronic disease. The easiest way to detect these rare variants will be through large pedigrees, because rare variants will be amplified in families. Our results suggest that by controlling for some of the stochastic environmental noise between two highly correlated traits, we can improve the ability to identify genetic variants in pedigrees through an increase in heritability. For the current study, we proceeded with a two-step process. The first step was to conduct a bivariate polygenic analysis between a discrete trait and a quantitative trait to calculate a β coefficient for our quantitative trait. We found that this increased our average heritability when the β coefficient for the quantitative trait was constrained. We then compared two measured genotype association analyses, one in which the β coefficient was constrained for the quantitative trait to account for environmental correlations and the other in which the β coefficient was allowed to vary. Neither method identified any true associated variants in the GAW17 family dataset when a strict correction for multiple testing was used, but the novel environmental regression method did allow for an increase in the chi-square value for SNPs known to be associated with affection status, particularly in replicates in which the heritability was improved.
The Genetic Analysis Workshops are supported by National Institutes of Health grant R01 GM031575 from the National Institute of General Medical Sciences. The SOLAR statistical genetics computer package is supported by US National Institute of Mental Health grant MH059490. The supercomputing facilities used for this work at the AT&T Genetics Computing Center were supported in part by a gift from the SBC Foundation.
This article has been published as part of BMC Proceedings Volume 5 Supplement 9, 2011: Genetic Analysis Workshop 17. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/5?issue=S9.
- Amos CI, Laing AE: A comparison of univariate and multivariate tests for genetic linkage. Genet Epidemiol. 1993, 10: 671-676. 10.1002/gepi.1370100657.View ArticlePubMedGoogle Scholar
- Jiang C, Zeng ZB: Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics. 1995, 140: 1111-1127.PubMed CentralPubMedGoogle Scholar
- Mangin B, Thoquet P, Grimsley N: Pleiotropic QTL analysis. Biometrics. 1998, 54: 88-99. 10.2307/2533998.View ArticleGoogle Scholar
- Almasy L, Dyer TD, Blangero J: Bivariate quantitative trait linkage analysis: pleiotropy versus co-incident linkages. Genet Epidemiol. 1997, 14: 953-958. 10.1002/(SICI)1098-2272(1997)14:6<953::AID-GEPI65>3.0.CO;2-K.View ArticlePubMedGoogle Scholar
- Blangero J, Almasy L, Williams JT, Porjesz B, Reich T, Begleiter H, COGA Collaborators: Incorporating quantitative traits in genomic scans of psychiatric diseases: alcoholism and event-related potentials. Am J Med Genet. 1997, 74: 602-Google Scholar
- Williams JT, Van Eerdewegh P, Almasy L, Blangero J: Joint multipoint linkage analysis of multivariate qualitative and quantitative traits. I. Likelihood formulation and simulation results. Am J Hum Genet. 1999, 65: 1134-1147. 10.1086/302570.PubMed CentralView ArticlePubMedGoogle Scholar
- Almasy LA, Dyer TD, Peralta JM, Kent JW, Charlesworth JC, Curran JE, Blangero J: Genetic Analysis Workshop 17 mini-exome simulation. BMC Proc. 2011, 5 (suppl 9): S2-10.1186/1753-6561-5-S9-S2.PubMed CentralView ArticlePubMedGoogle Scholar
- Almasy L, Blangero J: Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet. 1998, 62: 1198-1211. 10.1086/301844.PubMed CentralView ArticlePubMedGoogle Scholar
- Foulkes AS: Applied Statistical Genetics with R: For Population-Based Association Studies. 2009, New York, SpringerView ArticleGoogle Scholar
- Moskvina V, Schmidt KM: Individual SNP allele reconstruction from informative markers selected by a non-linear Gauss-type algorithm. Hum Hered. 2006, 62: 97-106. 10.1159/000096097.View ArticlePubMedGoogle Scholar
- Charlesworth JC, Peralta JM, Drigalenko E, Goring HH, Almasy L, Dyer TD, Blangero J: Toward the identification of causal genes in complex diseases: a gene-centric joint test of significance combining genomic and transcriptomic data. BMC Proc. 2009, 3 (suppl 7): S92-10.1186/1753-6561-3-s7-s92.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.