- Open Access
Transcription activity hot spot, is it real or an artifact?
© Wang et al; licensee BioMed Central Ltd. 2007
- Published: 18 December 2007
Transcription activity 'hot spots', defined as chromosome regions that contain more expression quantitative trait loci than would have been expected by chance, have been frequently detected both in humans and in model organisms. It has been common to consider the existence of hot spots as evidence for master regulation of gene expression. However, hot spots could also simply be due to highly correlated gene expressions or linkage disequilibrium and do not truly represent master regulators. A recent simulation study using real human gene expression data but simulated random single-nucleotide polymorphism genotypes showed patterns of clustering of expression quantitative trait loci that resemble those in actual studies [Perez-Enciso: Genetics 2004, 166: 547–554.]. In this study, to assess the credibility of transcription activity hot spots, we conducted genetic analyses on gene expressions provided by Genetic Analysis Workshop 15 Problem 1.
- Expression Phenotype
- Genetic Analysis Workshop
- Expression Quantitative Trait Locus
- Permute Data
- Autosomal SNPs
First pinpointed by Schadt et al. , expression quantitative trait loci (eQTL) 'hot spots', i.e., transcription activity hot spots, defined as chromosome regions that contain more eQTL than would have been expected by chance, have been points of research interest in almost all studies that search for genetic regulators for gene expression. Hot spots of gene regulation are most prominent in yeast [1, 2], where eight have been detected. Hot spots have also been reported in differentiating xylem of a eucalyptus hybrid , mice , humans , and other organisms. Zheng et al.  observed hot spots harboring important breast cancer genes.
There are several interpretations of the existence of eQTL hotspots. The most common one states that hot spots could be due to some common regulatory elements that regulate transcription levels of a group of genes. Other interpretations are that eQTL hotspots represent gene-rich regions, or simply reflect the clustering of spurious QTLs from highly correlated expression levels, or from linkage disequilibrium (LD). A more recent study with expression data from two human genes with simulated single-nucleotide polymorphism (SNP) genotypes that are independent of the expression levels showed patterns of clustering of eQTL that resemble those published in human studies . The observed enrichment was not random but neither was it caused by a putative mutation with a regulator effect, as all eQTL detected by design were false positives. The author concluded that the evidence of eQTL hotspots should be carefully evaluated and cautiously interpreted, and statistical analysis usually cannot distinguish between correlation and causation.
In this study, we aimed to assess and better understand features of transcription activity hot spots. We conducted a total of 3554 genome-wide linkage scans with 2819 autosomal SNPs on 3554 gene expression profiles. We found that high correlation between expression phenotypes might be a major source of contribution to the existence of hot spots. However, if a group of expression phenotypes are not correlated but are detected as transcription hotspots, the results might be more reliable and might represent a group of truly commonly regulated genes.
Centre d'Etude du Polymorphisme Humain (CEPH) samples
Based on 14 CEPH Utah families with 194 individuals, Genetic Analysis Workshop 15 (GAW15) Problem 1 provided 3554 gene expression profiles and 2882 SNPs across the genome (we used 2819 autosomal SNPs in the analyses), together with the physical map. Sex-specific genetic maps were provided by Sung et al.  and were used in the analyses.
Genome-wide regression-based multipoint linkage analysis with quantitative traits was conducted with merlin-regress in MERLIN . Merlin-regress determines evidence for linkage at each SNP based on a regression of estimated identity-by-descent (IBD) sharing between relative pairs on the squared sums and squared differences of trait values of the relative pairs . Narrow-sense trait heritability was first estimated in MERLIN. The error-checking algorithm implemented in MERLIN was applied, and erroneous genotypes were excluded with command pedwipe before the linkage analysis.
eQTL hotspots detection
To assess the clustering pattern of eQTL, we divided the autosomal genome into N B number of bins, each containing a fixed number of consecutive SNPs and with a smaller bin at the end of each chromosome. We then counted the number of genes with significant eQTLs in each bin. One 'hit' was counted for an expression phenotype if one or more SNPs within this bin were significant for the expression phenotype. The total number of hits, N H , along the autosomal genome can be defined this way. We hypothesized that if there was no enrichment in eQTL clustering, N H would be distributed randomly across the N B bins, thus the number of hits per bin will follow a Poisson distribution, with mean N H /N B . The significance of eQTL enrichment within each bin was therefore assessed using the Poisson distribution, and a Bonferroni correction was applied to account for the fact that N B tests were conducted.
To assess the reliability and credibility of the detected transcription activity hot spots, we conducted two analyses. First, we randomly removed one expression phenotype from a pair that has pair-wise correlation greater than a fixed value ρ, forming a subset of the gene expression that has pair-wise correlation smaller than ρ. More specifically, we first calculated all pair-wise correlations from the 3554 phenotypes and then randomly dropped one phenotype from the pairs that had pair-wise correlations greater than ρ. We then applied the same linkage analysis and hot spot detection procedure to the subset of the data with less correlated expression phenotypes. Second, we permuted the expression phenotypes within a family to generate a new data set that has no association between expression phenotypes and SNP genotypes and then applied the same linkage analysis and hot spot detection procedure.
We applied a stringent significance level in defining linkage signal and used a threshold of LOD > 5.3, corresponding to a point-wise p-value of < 3.9 × 10-7. The eQTL detected through this criterion has corresponding genome-wide threshold approximately 0.001. With this threshold applied to 3554 genome-wide scans, we observed 244 expression phenotypes that have evidence for linkage. The examination of regulators for the 244 expression phenotypes shows that gene-expression QTL are clustered, i.e., there are some transcription activity hot spots that contain more significant eQTL than would have been expected by chance across the created bins along the autosomal genome.
To examine whether the hot spot is partially due to high correlation among expression phenotypes, we chose two thresholds and created two subsets by randomly removing one expression phenotype within a pair that has pair-wise correlation greater than 0.8, or randomly removing one expression phenotype within a pair that has pair-wise correlation greater than 0.6.
Summary of results from different bin sizes and different correlation thresholds
Number of bins defined
Bin length (cM)
Full data (n = 3554)b (No.c sig. phenotypes = 244)
Number of hits defined
No. sig. hot spots
corr < 0.8 (n = 3326)b (No.csig. phenotypes = 227)
Number of hits defined
No. sig. hot spots
corr < 0.6 (n = 1754)b (No.c sig. phenotypes = 131)
Number of hits defined
No. sig. hot spots
Biological properties of the clustered expression phenotypes within the hotspot on chromosome 14
Gene ontology molecular function
Gene ontology biological process
intracellular protein transport
protein disulfide isomerase activity
calcium ion binding
glutathione transferase activity
protein phosphatase type 2A regulator activity
response to biotic stimulus
nuclear mRNA splicing, via spliceosome
inositol phosphatase activity
intracellular protein transport
electron transporter activity
NADH dehydrogenase activity
generation of precursor metabolites and energy
Although it has been common to consider the existence of hot spots as evidence for master regulation of gene expression, we should always be more cautious in interpreting such results because the findings might be simply due to highly correlated gene expressions or linkage disequilibrium and do not truly represent master regulation. In this study, in order to assess the reliability and credibility of frequently detected transcription activity hot spots, we conducted two analyses on all 3554 gene expression phenotypes using GAW Problem 1 data. Note that no screen steps were applied to select a subset of gene expression profile. Although this may bring noise to the analysis, Huang et al.  suggested that gene expressions with very low heritability may show very high linkage signals. Further research and more careful selection procedures are definitely needed. We first created a subset of data with pair-wise correlation smaller than a fixed value, and then examined the existence of eQTL hot spots. The results suggest that two explanations are possible. First, if genes that are indeed commonly regulated and are also highly correlated, removing a subset of highly correlated genes might weaken the hot spot signal; second, for genes that are not commonly regulated but somehow are highly correlated, when we remove a subset of highly correlated genes, the hot spots that remain detected might truly represent master regulation. Results from permuted data both with and without highly correlated expression phenotypes confirm the above findings. Experimental results should always be interpreted with caution and more thorough analyses need to be conducted before reaching any firm conclusions.
This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.
- Schadt E, Monks S, Drake T, Lusis A, Che N, Colinavo V, Ruff T, Milligan S, Lamb J, Cavet G, Linsley PS, Mao M, Stoughton RB, Friend SH: Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003, 422: 297-302. 10.1038/nature01434.View ArticlePubMedGoogle Scholar
- Brem R: Genetic dissection of transcriptional regulation in budding yeast. Science. 2002, 296: 752-755. 10.1126/science.1069516.View ArticlePubMedGoogle Scholar
- Kirst M, Basten C, Muburg A, Zeng Z, Sederoff R: Genetic architecture of transcript-level variation in differentiating xylem of a eucalyptus hybrid. Genetics. 2005, 169: 2295-2303. 10.1534/genetics.104.039198.View ArticlePubMed CentralPubMedGoogle Scholar
- Morley M, Molomy C, Weber T, Devlin J, Ewens K, Spielman R, Cheung V: Genetic analysis of genome-wide variation in human gene expression. Nature. 2004, 430: 743-747. 10.1038/nature02797.View ArticlePubMed CentralPubMedGoogle Scholar
- Zheng T, Wang S, Cong L, Ding Y, Ionita-Laza I, Lo S-H: Joint study of genetic regulators for expression traits related to breast cancer. BMC Proc. 2007, 1 (Suppl 1): S10-View ArticlePubMed CentralPubMedGoogle Scholar
- Perez-Enciso M: In silico study of transcriptome genetic variation in outbred populations. Genetics. 2004, 166: 547-554. 10.1534/genetics.166.1.547.View ArticlePubMed CentralPubMedGoogle Scholar
- Sung YJ, Di Y, Fu AQ, Rothstein JH, Sieh W, Tong L, Thompson EA, Wijsman EM: Comparison of multipoint linkage analyses for quantitative traits in the CEPH data: parametric LOD scores, variance components LOD scores, and Bayes factors. BMC Proc. 2007, 1 (Suppl 1): S93-View ArticlePubMed CentralPubMedGoogle Scholar
- Abecasis GR, Cherny SC, Cookson WD, Cardon L: Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97-101. 10.1038/ng786.View ArticlePubMedGoogle Scholar
- Sham P, Purcell S, Cherny S, Abecasis G: Powerful regression-based quantitative-trait linkage analysis of general pedigrees. Am J Hum Genet. 2002, 71: 238-253. 10.1086/341560.View ArticlePubMed CentralPubMedGoogle Scholar
- Huang S, Ballard D, Zhao H: The role of heritability in mapping expression quantitative trait loci. BMC Proc. 2007, 1 (Suppl 1): S86-View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.