- Open Access
Familial aggregation analysis of gene expressions
- Shao-Qi Rao†1, 2, 3Email author,
- Liang-De Xu†1,
- Guang-Mei Zhang4,
- Xia Li1, 3, 5Email author,
- Lin Li2,
- Gong-Qing Shen2,
- Yang Jiang1,
- Yue-Ying Yang1,
- Bin-Sheng Gong1,
- Wei Jiang1,
- Fan Zhang1,
- Yun Xiao1 and
- Qing K Wang2
© Rao et al; licensee BioMed Central Ltd. 2007
Published: 18 December 2007
Traditional studies of familial aggregation are aimed at defining the genetic (and non-genetic) causes of a disease from physiological or clinical traits. However, there has been little attempt to use genome-wide gene expressions, the direct phenotypic measures of genes, as the traits to investigate several extended issues regarding the distributions of familially aggregated genes on chromosomes or in functions. In this study we conducted a genome-wide familial aggregation analysis by using the in vitro cell gene expressions of 3300 human autosome genes (Problem 1 data provided to Genetic Analysis Workshop 15) in order to answer three basic genetics questions. First, we investigated how gene expressions aggregate among different types (degrees) of relative pairs. Second, we conducted a bioinformatics analysis of highly familially aggregated genes to see how they are distributed on chromosomes. Third, we performed a gene ontology enrichment test of familially aggregated genes to find evidence to support their functional consensus. The results indicated that 1) gene expressions did aggregate in families, especially between sibs. Of 3300 human genes analyzed, there were a total of 1105 genes with one or more significant (empirical p < 0.05) familial correlation; 2) there were several genomic hot spots where highly familially aggregated genes (e.g., the chromosome 6 HLA genes cluster) were clustered; 3) as we expected, gene ontology enrichment tests revealed that the 1105 genes were aggregating not only in families but also in functional categories.
Familial aggregation is the more frequent occurrence of a trait in members of a family than among non-related individuals. Thus, it is a common analysis method to determine the genetic contribution to a complex human disease. Technically, this type of analysis is a more detailed version of the mixed linear model approach in that each type of relative pairs is estimated separately instead of modeling them as a function of a few parameters in a single covariance matrix. Historically, familial aggregation analysis has been the most popular method for determining genetic causes of disease. This method, in essence, is to estimate the correlations between various biological relatives and then similarly assume that they can be parsimoniously explained by an additive genetic contribution and a common household contribution, but without making all the other assumptions of the mixed linear model. Although familial aggregation has been well studied for many diseases , genome-wide gene expressions typically have not been used as the traits. Problem 1 data for Genetic Analysis Workshop 15 (GAW15), initially used for mapping expression quantitative trait loci , provided expression levels of 3554 genes in lymphoblastoid cells for 14 three-generation CEPH (Centre d'Etude du Polymorphisme Humain) Utah families. Because of their inborn nature, expression of these genes might be less affected by a list of environmental factors for complex human diseases. Therefore, the specific aims of the present study were to answer three genetics questions: 1) how gene expressions aggregate among different types (degrees) of relative pairs; 2) how they are distributed on chromosomes; and 3) what functional implications they have.
Description of the data set
Expression levels of genes in lymphoblastoid cells of each individual of 14 three-generation CEPH Utah families (~8 offspring per sibship, ~14 individuals per family, total of 194 individuals) were provided for GAW15 Problem 1. For 3554 of the 8500 genes tested, Morley et al.  found greater variation among individuals than between replicate determinations on the same individual. We further reduced the above number of genes to 3300 by deleting the genes having uncertain chromosome locations or situated on chromosomes X and Y.
Calculating familial correlations
S.A.G.E FCOR  can be used to calculate familial correlations for a variety of biological relative types. Here, this module was used to calculate familial correlation (R) for 15 relative types: father-son (FS), mother-son (MS), father-daughter (FD), mother-daughter (MD), brother-brother (BB), sister-brother (SB), sister-sister (SS), grandfather-father-grandson (FFS), grandmother-father-grandson (MFS), grandfather-mother-grandson (FMS), grandmother-mother-grandson (MMS), grandfather-father-granddaughter (FFD), grandmother-father-granddaughter (MFD), grandfather-mother-granddaughter (FMD) and grandmother-mother-granddaughter (MMD). As reported by S.A.G.E. PEDINFO , the CEPH Utah families provided 220 parent-offspring pairs, 378 sibling pairs, and 440 extended relative pairs. To test the statistical significance of a correlation estimate and to correct for multiple tests for 15 relative types, we also performed 100,000 permutations on the 3300 × 15 (genes × number of relative types) matrix. The empirical thresholds are R = 0.4609 and R = 0.6532, respectively, for the significant levels of 0.05 and 0.01.
Gene ontology (GO) enrichment analysis of familially aggregated genes
The p-value calculated above corresponds to a one-sided test and a smaller p-value relates to a higher likelihood of a GO category's enrichment with genes that aggregate significantly in families. In this study, to avoid the possible loss of the true positives, we identified significant GO categories on the basis of the criterion of nominal significance of p ≤ 0.01. Therefore, the p-value quoted should be considered as a heuristic measure, useful as an indicator that roughly rates the relative enrichment of significantly familially aggregated genes for each GO category.
How do gene expressions aggregate among different types of relative pairs?
Numbers of significant familial correlations (R)
Observed number of pairs
R > 0.4609
R > 0.6532
(P < 0.05)
(P < 0.01)
How are familial correlations distributed (aggregated) over chromosomes?
How are significantly familially aggregated genes aggregating in function categories?
A list of GO molecular function categories highly significantly (p ≤ 0.01) enriched with genes of high familial aggregations
6.15 × 10-7
structural constituent of ribosome
2.5 × 10-5
4.64 × 10-5
9.95 × 10-5
calcium ion binding
unfolded protein binding
ATP-dependent RNA helicase activity
transcription factor activity
ubiquitin-protein ligase activity
nucleic acid binding
iron ion binding
To the best of our knowledge, this study is the first attempt to relate the familial aggregation patterns of genes with their genomic locations and their functionalities. Familial aggregation analysis of a large number of genes using different relative types suggests that some non-Mendelian genetic factors or environment factors may affect these gene expressions too, such as age-dependent genetic imprinting  or antagonistic environments for family members in different generations, possibly leading to biased estimates of some familial correlations. Regarding the use of different relative types for estimating additive genetic effects in gene expressions, it appears that no single relative type stands out as the best for all scenarios. The results acquired from this analysis of genome-wide gene expression traits raise a paradoxical challenge regarding the use of familial aggregation analysis to determine the genetic contribution to a quantitative trait. On one hand, the use of sibling pairs is favored because it is unlikely to produce a negative estimate of heritability, but tends to overestimate it because of the larger shared non-genetic components and dominance components. On the other hand, the use of other relative pairs is unlikely to overestimate heritability, but can be problematic if some factor(s) (e.g., antagonistic environments) causes the familial individuals between different generations to be environmentally negatively correlated.
Further bioinformatics analysis of familial aggregated genes suggests some consistencies between familial aggregations and chromosomal aggregations and functional aggregations. However, we feel that these exploratory results warrant further investigation because of the limited sample size used in the study. In addition, traditional quantitative genetics approaches, which assume a polygenic basis for the studied traits and normal distribution of the underlying genetic effects, might not be the most appropriate to analyze the expression phenotypes whose genetic models could be monogenic or oligogenic. Although the familial aggregation analysis approach as implemented in S.A.G.E. is robust to non-normality of traits, further study is needed regarding our method's behaviors and properties when applied to traits having a genetic basis quite deviated from what is expected for truly quantitative traits.
Most of our results from the genetic epidemiological analysis were consistent with quantitative genetics theory. Further bioinformatics analysis revealed that familially aggregated genes tended to aggregate on some genomic regions and to enrich their functional categories.
This work was supported in part by the National Natural Science Foundation of China (Grants 30170515, 30370798, 30571034, 30570424, 30600367, and 30670484) and U.S. National Institutes of Health SCCOR grant (P50 HL077101-01). Some of the results were obtained using the program package S.A.G.E., which is supported by a U.S. Public Health Service Resource Grant (RR03655) from the National Center for Research Resources.
This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.
- Khoury MJ, Beaty TH, Kung-Yee L: Can familial aggregation of disease be explained by familial aggregation of environmental risk factors?. Am J Epidemiol. 1988, 127: 674-683.PubMedGoogle Scholar
- Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, Cheung VG: Genetic analysis of genome-wide variation in human gene expression. Nature. 2004, 430: 743-747. 10.1038/nature02797.View ArticlePubMed CentralPubMedGoogle Scholar
- Statistical Analysis for Genetic Epidemiology, Release 5.3. [http://genepi.cwru.edu/]
- Guo Z, Zhang T, Li X, Wang Q, Xu J, Yu H, Zhu J, Wang H, Wang C, Topol EJ, Wang Q, Rao S: Towards precise classification of cancers based on robust gene functional expression profiles. BMC Bioinformatics. 2005, 6: 58-10.1186/1471-2105-6-58.View ArticlePubMed CentralPubMedGoogle Scholar
- Falconer DS, Mackay TFC: Introduction to Quantitative Genetics. 1996, London: Longman, 4Google Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.View ArticlePubMed CentralPubMedGoogle Scholar
- Bjornsson HT, Fallin MD, Feinberg AP: An integrated epigenetic and genetic approach to common human disease. Trends Genet. 2004, 20: 350-358. 10.1016/j.tig.2004.06.009.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.