- Open Access
Co-regulation and multilocus determinants of gene expression in humans
© Kerner et al; licensee BioMed Central Ltd. 2007
Published: 18 December 2007
The regulation of gene expression is an emerging area of investigation. Increased knowledge can deepen our understanding of the genetic contributions to variations in complex traits. The purpose of this study is to explore the feasibility of detecting regulatory elements of gene expression with multivariate analyses.
Peripheral blood lymphocyte expression levels of 30 genes on chromosome 5 and a single gene, DEAD, on chromosome 22 were analyzed in single-point variance-component linkage analyses in multiplex families to identify putative regulatory regions. To explore the possibility of regulatory regions having individual relationships with the expression levels of a single gene, we utilized stepwise regression. To explore the possibility of pleiotropy of a single regulatory locus for multiple genes, bivariate linkage analysis was applied.
Twenty-one loci were linked to five expression levels. The two most significant were for the known region on chromosome 22 (LOD = 4.62). On chromosome 5 a LOD of 4.57 was found for the gene leukocyte-derived arginine aminopeptidase (LRAP) with a single-nucleotide polymorphism (SNP) within 5 Mb. Both genes showed evidence of linkage to multiple SNPs. When 194 family members were treated as independent, stepwise regression identified fewer single-nucleotide polymorphisms with significant predictive values (p < 0.05), providing evidence for multiple regulatory regions of unequal effect. However, when corrections for non-independence were applied these results could no longer be detected.
The complex nature of gene regulation can be explored by linkage analysis with single-nucleotide polymorphisms followed by multivariate methods to explore co-regulation.
Regulation of gene expression in human peripheral blood lymphocytes is not well understood. Most knowledge comes from the study of individual genes and their nearby regulatory elements. Taking this approach, a relatively small number of differentially expressed genes has been identified, and their regulatory elements found to have a major effect on gene expression (>50%).
However, emerging evidence indicates that gene expression is regulated by a network of elements that are both close to the gene and at large distances. Studies in flies, yeast, and mice suggest that most genes are regulated by many regulatory elements with small effects and these complicated interactions may be difficult to detect with current methods of analysis . High-throughput methods for single-nucleotide polymorphism (SNP) genotyping and detection of expression levels allow us to further investigate these mechanisms.
We report a study designed to explore the feasibility of detecting regulatory elements of gene expression with multivariate analyses.
The Genetic Analysis Workshop 15 (GAW15) sample
Gene expression data from lymphoblastoid cell lines of 194 individuals from 14 three-generation Centre d'Etude du Polymorphisme Humain families obtained using an Affimetrix Human Focus Array were provided. This study was approved by the Institutional Review Board at UCLA.
Selection of genes and SNPs
To apply our method we focused our analysis on 90 gene expression traits for those genes residing on chromosome 5q. One trait on chromosome 22 with a known significant linkage signal to a SNP in close proximity to the gene was selected as well . For these genes, we tested the normality of expression levels, and estimated their heritability using the SOLAR software . We also included sex as a covariate in the genetic model to test its significance for the expression levels of the genes. Only traits with a heritability ≥ 0.2 and skewness and kurtosis estimates within the normal range were retained. Thirty traits on chromosome 5q and chromosome 22 met these criteria. Four SNPs on chromosome 5 had a Mendelian error in two families and eight SNPs on chromosome 22 had Mendelian errors in four families. Those families with genotypes with Mendelian inconsistencies were eliminated for the particular SNP.
Single and bivariate trait single-point variance-component (VC) quantitative trait linkage (QTL) analyses were performed using the SOLAR software on gene expression levels for the 31 genes selected. Given the relatively small number of pedigrees and the exploratory nature of these studies, a LOD of 2.0 was used as the threshold for linkage.
In cases in which multiple SNPs were linked to an expression trait, stepwise multivariate regression analysis was used to identify loci that were the strongest predictors of the expression levels, and to understand if all loci contributed equally . In stepwise regression the SNP genotypes for each individual were used as explanatory variables after recoding them in a co-dominant fashion. Sex was included in the analysis to determine its possible significance. Analyses were conducted using SAS software version 9.1 . A significance level of 0.05 was set as a criterion for entry into the model, as well as for remaining in the model. Because our data were collected in families that shared genetic information, they were not uncorrelated. We therefore took non-independence of the observations into account by using the final model from the stepwise regression and calculating a robust standard error using Proc Surveyreg. This procedure computes the regression coefficient estimation by generalized least-squares estimation using element-wise regression and the Taylor expansion theory for estimating sampling errors of estimation based on complex sample designs .
Test for pleiotropy
To test for the possibility that a single gene (allele) influences the expression levels of multiple traits, we extended the univariate genetic analyses to a bivariate analysis in which the bivariate phenotypes are modeled as the outcome and genetic and environmental correlations can be estimated as described in detail by Almasy et al. . A model in which all parameters are estimated is compared with a model in which the genetic correlation is constrained to zero to test for pleiotropic effects. The significance of this test implies a common set of genes contributing to the variance in the two traits. This model can be extended to incorporate SNP data. We can then test for linkage by comparing likelihoods of the model with and without the genetic data. Twice the difference of the log likelihoods of the two models is distributed asymptotically as a chi-square with 1 degree of freedom. A bivariate analysis was conducted in SOLAR for those genes where expression levels showed evidence of co-regulation by a single common SNP in order to test for the possibility that such a joint analysis would increase the power to detect linkage.
Linkage and multivariate analysis
Expression traits with linkage to SNPs on Chromosomes 5 and 22
Leukocyte-derived arginine aminopeptidase
SAR1 gene homolog B (S. cerevisiae)
Treacher Collins-Franceschetti syndrome 1
G protein-coupled receptor kinase 6
DEAD (Asp-Gl-Ala-Asp) box polypeptide 17
Results of stepwise regression
Pr > F
In an exploratory linkage study of 31 expression traits on chromosomes 5 and 22, we found significant linkage signals for two traits, LRAP and DDX17, with SNPs in close proximity to the gene loci (cis). Sex as a covariate was not significant in our analysis. Other GAW15 groups who analyzed these data reported similar results [7, 8]. Those two expression traits also showed evidence of linkage to multiple SNPs at considerable distances away from the gene locations (trans). In an evaluation of co-regulation, the SNPs did not remain significant in a stepwise regression when the non-independence of the data due to the family structure was taken into account. Evidence for potential pleiotropy was not supported by bivariate QTL analysis. Unfortunately, this study is limited by the number of families we could include in the analyses.
We demonstrate here that the complex nature of gene regulation can be explored by linkage analysis with SNPs followed by multivariate methods to explore co-regulation.
This work was supported in part by the Statistical Core (RMC) of NIH Program Project HL-2848, and NIMH grant K08 MH074057-01 to BK.
This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.
- Stamatoyannopoulos JA: The genomics of gene expression. Genomics. 2004, 84: 449-457. 10.1016/j.ygeno.2004.05.002.View ArticlePubMedGoogle Scholar
- Morley M, Molony CM, Weber T, Devlin JL, Ewens KG, Spielman RS, Cheung VG: Genetic analysis of genome-wide variation in human gene expression. Nature. 2004, 430: 743-747. 10.1038/nature02797.View ArticlePubMed CentralPubMedGoogle Scholar
- Almasy L, Blangero J: Multipoint quantitative trait linkage analysis in general pedigrees. Am J Hum Genet. 1998, 62: 1198-1211. 10.1086/301844.View ArticlePubMed CentralPubMedGoogle Scholar
- Hocking RR: The analysis and selection of variables in linear regression. Biometrics. 1976, 32: 1-50. 10.2307/2529336.View ArticleGoogle Scholar
- SAS Institute Inc: SAS Online Documentation, Version 9.1. 2004, Cary, NC: SAS Institute IncGoogle Scholar
- Almasy L, Dyer TD, Blangero J: Bivariate quantitative trait linkage analysis: pleiotropy versus co-incident linkages. Genet Epidemiol. 1997, 14: 953-958. 10.1002/(SICI)1098-2272(1997)14:6<953::AID-GEPI65>3.0.CO;2-K.View ArticlePubMedGoogle Scholar
- Sung YJ, Di Y, Fu AQ, Rothstein JH, Sieh W, Tong L, Thompson EA, Wijsman EM: Comparison of multipoint linkage analyses for quantitative traits in the CEPH data: parametric LOD scores, variance components LOD scores, and Bayes factors. BMC Proc. 2007, 1 (Suppl 1): S93-View ArticlePubMed CentralPubMedGoogle Scholar
- Rangrej J, Beyene J, Hu P, Paterson AD: Sex, age, and generation effects on genome-wide linkage analysis of gene expression in transformed lymphoblasts. BMC Proc. 2007, 1 (Suppl 1): S92-View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.