- Open Access
Sex, age and generation effects on genome-wide linkage analysis of gene expression in transformed lymphoblasts
© Rangrej et al; licensee BioMed Central Ltd. 2007
- Published: 18 December 2007
Many traits differ by age and sex in humans, but genetic analysis of gene expression has typically not included them in the analysis.
We used Genetic Analysis Workshop 15 Problem 1 data to determine whether gene expression in lymphoblasts showed differences by age and/or sex using generalized estimating equations (GEE). We performed quantitative trait linkage analysis of these genes including age and sex as covariates to determine whether the linkage results changed when they were included as covariates. Because the families included in the study all contain three generations, we also determined what effect inclusion of generation in the model had on the age effects.
When controlling the false-discovery rate at 1%, using GEE we identified 30 transcripts that showed significant differences in expression by sex, while 1950 transcripts showed differences in expression associated with age. When subjected to linkage analysis, there were 37 linkages that disappeared, while 17 appeared when sex was included as a covariate. All these genes were, as expected, on the sex chromosomes. In contrast, when age was included in the linkage analysis, 462 linkage signals were no longer significant, while 223 became significant. When generation was included in the model with age, all but 6 of the GEE age effects were no longer significant. However, there were minimal changes in the linkage results.
The effect of age on linkage analyses was apparent for the expression of many genes, which appear to be mostly due to differences between the generations.
- Linkage Analysis
- Generalize Estimate Equation
- Generalize Estimate Equation
- Influence Gene Expression
- Linkage Result
Many traits in humans differ by sex and age, but analyses of gene expression typically do not include them as covariates . It has been shown in simulation studies that incorporating appropriate covariates in linkage analysis improves power without compromising type I error . We were interested to determine if there are loci that influence gene expression whose detection is conditional on inclusion or exclusion of age and/or sex from the analysis. In addition, since the data were from three-generation families, we determined to what extent the age effects are accounted for by generation effects.
Association between age and sex, and gene expression
We used the 3554 transcripts that were reported to show greater variation between than within 94 grandparents from CEPH (Centre d'Etude du Polymorphisme Humain) pedigrees . Age data were obtained from http://www.coriell.org; they were not available for any member of pedigree 1454 and three individuals from three other families. We used generalized estimating equations (GEE)  in the computer program R to test whether the expressions of each gene differed by age and sex using family as a clustering variable and using an exchangeable correlation structure. In addition, indicators for generation were also added to the model. We calculated q-values to estimate the false-discovery rate for covariate effects .
Expression quantitative trait linkage analysis
Expression quantitative trait linkage (eQTL) analysis was performed using MERLIN-REGRESS v 1.0.1 with a bug fix for missing covariates [5, 6]. Mendelian inconsistencies between grandparents and children were removed. Marker allele frequencies for 2871 single-nucleotide polymorphisms (SNPs) were estimated from the data and single-point linkage analysis was used (10,203,534 tests). Variance-components linkage analysis, using MERLIN, was used to analyze the X chromosome because MERLIN-REGRESS cannot analyze X chromosome data. We present results using the following criteria for considering linkage results to be different between the analysis with and without covariates: linkage results with a LOD score >3 and either a 3 LOD unit increase or decrease in linkage when sex or age was included as a covariate. Linkage analysis including age and both age and generation were used to determine what effect including generation had on linkage results.
Association between age, sex, generation and gene expression
Descriptive information about age and sex
N missing age
73 ± 9 (61–92)
46 ± 7 (39–66)
17 ± 7 (5–34)
70 ± 7 (61–85)
46 ± 5 (39–59)
18 ± 8 (5–37)
Expression quantitative trait linkage analysis
Marked changes in linkage when sex was included as a covariate
Change when sex included
Marked changes in linkage when age was included as a covariate
Change when age included
We found that the majority of genes show significant differences in expression by age, while only a subset show significant sex differences. There were more linkage signals that were no longer significant when sex or age were included as covariates than appeared as a consequence of inclusion of these covariates. When generation was also included in the linkage analysis with age, few linkage results changed. Limitations of our analysis include the fact that age data were missing for all individuals in one family and for three individuals in other families.
In addition, we took a blanket approach to all traits because performing detailed diagnostics for numerous traits is not straightforward. To investigate the potential risks of this approach, we examined the trait distributions for the linkage results that changed dramatically when either sex or age was included as covariate (Tables 2 and 3). As expected, many of the traits that had marked sex effects on linkage results were bimodally distributed, which may result in false-positive linkage results . Interestingly, when we repeated linkage analyses for loci where age and sex had marked effects using the variance-components methodology (as opposed to MERLIN-REGRESS), there was little difference between inclusion and exclusion of the covariate, raising concern about the validity of the regression results.
Morley et al.  selected genes for linkage analysis based on greater variance between, compared to within, individuals. They performed this analysis on the grandparents: of those available in this data set, the mean age was 72 years (SD = 8, range = 61–92). However, the grandparents were extracted from linkage analysis. Only the traits of children, whose mean age was 18 years (SD = 8, ranges = 3–37) were used for the linkage analysis. Reasons for this may be related to limitation of available methods or a decision to attempt to reduce age effects. However, if the variance is not the same for grandparents and children, then such an approach may results in genes that are falsely included or excluded from the genetic linkage analysis. Furthermore, although in our analysis we used age, the age effects are mostly removed when generation was included. In such a situation age will be highly correlated with birth order within a sibship and therefore we cannot exclude that this has resulted in confounding.
We used a simple exchangeable covariance structure in our GEE analyses. This takes into account the family dependence to some extent but may not be the most appropriate covariance structure for the data. It would be interesting to investigate other covariance structures and assess the impact that they have on the overall findings.
Age, and to a lesser degree sex, influence gene expression in transformed B lymphocytes. Although including sex as a covariate did not result in many changes in the linkage results, when age was included the results changed more markedly-specifically there were fewer significant linkage results when age was included as a covariate. It appears that many of these age effects can be accounted for by generational differences in gene expression. Inclusion of covariates in quantitative trait linkage analysis may improve power and reduce false positives.
This work is supported by: Canada Research Chairs program (ADP), Hospital for Sick Children Foundation (ADP), Genome Canada (ADP and JB), CIHR (ADP and JB), NSERC (JB).
This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.
- Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, Cheung VG: Genetic analysis of genome-wide variation in human gene expression. Nature. 2004, 430: 743-747. 10.1038/nature02797.View ArticlePubMed CentralPubMedGoogle Scholar
- Pugh EW, Jaquish CE, Sorant AJ, Doetsch JP, Bailey-Wilson JE, Wilson AF: Comparison of sib-pair and variance-components methods for genomic screening. Genet Epidemiol. 1997, 14: 867-872. 10.1002/(SICI)1098-2272(1997)14:6<867::AID-GEPI51>3.0.CO;2-K.View ArticlePubMedGoogle Scholar
- Liang KY, Zeger SL: Longitudinal data analysis using generalized linear models. Biometrika. 1986, 73: 13-22. 10.1093/biomet/73.1.13.View ArticleGoogle Scholar
- Storey JD, Tibshirani R: Statistical significance for genome-wide experiments. Proc Natl Acad Sci USA. 2003, 100: 9440-9445. 10.1073/pnas.1530509100.View ArticlePubMed CentralPubMedGoogle Scholar
- Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97-101. 10.1038/ng786.View ArticlePubMedGoogle Scholar
- Sham PC, Purcell S, Cherny SS, Abecasis GR: Powerful regression-based quantitative-trait linkage analysis of general pedigrees. Am J Hum Genet. 2002, 71: 238-253. 10.1086/341560.View ArticlePubMed CentralPubMedGoogle Scholar
- Etzel CJ, Shete S, Beasley TM, Fernandez JR, Allison DB, Amos CI: Effect of Box-Cox transformation on power of Haseman-Elston and maximum-likelihood variance components tests to detect quantitative trait loci. Hum Hered. 2003, 55: 108-116. 10.1159/000072315.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.