Sex, age and generation effects on genome-wide linkage analysis of gene expression in transformed lymphoblasts.

BACKGROUND
Many traits differ by age and sex in humans, but genetic analysis of gene expression has typically not included them in the analysis.


METHODS
We used Genetic Analysis Workshop 15 Problem 1 data to determine whether gene expression in lymphoblasts showed differences by age and/or sex using generalized estimating equations (GEE). We performed quantitative trait linkage analysis of these genes including age and sex as covariates to determine whether the linkage results changed when they were included as covariates. Because the families included in the study all contain three generations, we also determined what effect inclusion of generation in the model had on the age effects.


RESULTS
When controlling the false-discovery rate at 1%, using GEE we identified 30 transcripts that showed significant differences in expression by sex, while 1950 transcripts showed differences in expression associated with age. When subjected to linkage analysis, there were 37 linkages that disappeared, while 17 appeared when sex was included as a covariate. All these genes were, as expected, on the sex chromosomes. In contrast, when age was included in the linkage analysis, 462 linkage signals were no longer significant, while 223 became significant. When generation was included in the model with age, all but 6 of the GEE age effects were no longer significant. However, there were minimal changes in the linkage results.


CONCLUSION
The effect of age on linkage analyses was apparent for the expression of many genes, which appear to be mostly due to differences between the generations.


Background
Many traits in humans differ by sex and age, but analyses of gene expression typically do not include them as covariates [1]. It has been shown in simulation studies that incorporating appropriate covariates in linkage analysis improves power without compromising type I error [2]. We were interested to determine if there are loci that influence gene expression whose detection is conditional on inclusion or exclusion of age and/or sex from the analysis. In addition, since the data were from three-generation families, we determined to what extent the age effects are accounted for by generation effects.

Association between age and sex, and gene expression
We used the 3554 transcripts that were reported to show greater variation between than within 94 grandparents from CEPH (Centre d'Etude du Polymorphisme Humain) pedigrees [1]. Age data were obtained from http:// www.coriell.org; they were not available for any member of pedigree 1454 and three individuals from three other families. We used generalized estimating equations (GEE) [3] in the computer program R to test whether the expressions of each gene differed by age and sex using family as a clustering variable and using an exchangeable correlation structure. In addition, indicators for generation were also added to the model. We calculated q-values to estimate the false-discovery rate for covariate effects [4].

Expression quantitative trait linkage analysis
Expression quantitative trait linkage (eQTL) analysis was performed using MERLIN-REGRESS v 1.0.1 with a bug fix for missing covariates [5,6]. Mendelian inconsistencies between grandparents and children were removed. Marker allele frequencies for 2871 single-nucleotide polymorphisms (SNPs) were estimated from the data and single-point linkage analysis was used (10,203,534 tests). Variance-components linkage analysis, using MERLIN, was used to analyze the X chromosome because MERLIN-REGRESS cannot analyze X chromosome data. We present results using the following criteria for considering linkage results to be different between the analysis with and without covariates: linkage results with a LOD score >3 and either a 3 LOD unit increase or decrease in linkage when sex or age was included as a covariate. Linkage analysis including age and both age and generation were used to determine what effect including generation had on linkage results.

Association between age, sex, generation and gene expression
Descriptive information regarding age and sex are provided in Table 1. From the GEE analysis of gene expression data, Figure 1A shows the distribution of the number of significant tests if different q-value thresholds are used for the models with age, sex, and generation. After adjustment for familial correlation there were 30 genes that showed significant differences by sex, compared to 1950 genes that were significantly associated with age (an FDR threshold of 0.01 was used). The respective p-values for test of significance were 0.000097 for sex and 0.023 for age. When generation was included in the model (Fig. 1B) only 6 of the age effects remained significant, while generation effects were significant for 277 and 862 genes, for the indicators for the grandparental (g1) and parental (g2) generations respectively (the number of genes with sex effects (29) remained similar). The p-values for this model were: sex = 8.1 × 10 -5 , age = 1.6 × 10 -5 , g1 = 0.0014, and g2 = 0.0053. Figure 2 shows the LOD scores with and without sex and age, respectively. Note the greater change in LOD scores when age ( Fig. 2B) was included as a covariate than when sex ( Fig. 2A) was included. Specifically, 37 linkage signals disappear and 17 appear when sex was included. As expected, all of those traits with linkage results that changed when sex was included map to the sex chromosomes, and the loci showing changes in linkage were on the autosomes (Table 2). Of particular interest were the four genes that are on the X chromosome and for which linkage signals appear on the autosomes when sex is included. This suggests that autosomal loci influence the expression of some genes that escape X-inactivation. For the age analysis, 462 significant linkage results disappeared while 223 appeared when age was included in the analysis. Table 3 provides a list of the 10 top SNPs and traits that show evidence for linkage when age is either included or excluded as a covariate. When linkage analysis was performed with both age and generation as covariates, the change in the linkage results ( Fig. 2C) was not as marked as when only age was included as a covariate (Fig.  2B). According to our criteria, only four linkage results disappeared when generation was added to age, and none appeared.

Discussion
We found that the majority of genes show significant differences in expression by age, while only a subset show significant sex differences. There were more linkage signals that were no longer significant when sex or age were included as covariates than appeared as a consequence of inclusion of these covariates. When generation was also included in the linkage analysis with age, few linkage results changed. Limitations of our analysis include the fact that age data were missing for all individuals in one family and for three individuals in other families.
In addition, we took a blanket approach to all traits because performing detailed diagnostics for numerous traits is not straightforward. To investigate the potential risks of this approach, we examined the trait distributions for the linkage results that changed dramatically when either sex or age was included as covariate (Tables 2 and  3). As expected, many of the traits that had marked sex effects on linkage results were bimodally distributed, which may result in false-positive linkage results [7]. Interestingly, when we repeated linkage analyses for loci where age and sex had marked effects using the variancecomponents methodology (as opposed to MERLIN-REGRESS), there was little difference between inclusion and exclusion of the covariate, raising concern about the validity of the regression results.
Morley et al. [1] selected genes for linkage analysis based on greater variance between, compared to within, individuals. They performed this analysis on the grandparents: of those available in this data set, the mean age was 72 years (SD = 8, range = 61-92). However, the grandparents were extracted from linkage analysis. Only the traits of children, whose mean age was 18 years (SD = 8, ranges = 3-37) were used for the linkage analysis. Reasons for this may be related to limitation of available methods or a decision to attempt to reduce age effects. However, if the variance is not the same for grandparents and children, then such an approach may results in genes that are falsely included or excluded from the genetic linkage analysis. Furthermore, although in our analysis we used age, the age effects are mostly removed when generation was Distribution of the number of gene expressions that are significant in the GEE model for the range of q-values Figure 1 Distribution of the number of gene expressions that are significant in the GEE model for the range of q-values. A, Covariates are sex and age; B, sex, age, and grandparental (gen1) and parental (gen2) generation.   included. In such a situation age will be highly correlated with birth order within a sibship and therefore we cannot exclude that this has resulted in confounding.

Distribution of LOD scores when covariates are included in linkage analysis
We used a simple exchangeable covariance structure in our GEE analyses. This takes into account the family dependence to some extent but may not be the most appropriate covariance structure for the data. It would be interesting to investigate other covariance structures and assess the impact that they have on the overall findings.

Conclusion
Age, and to a lesser degree sex, influence gene expression in transformed B lymphocytes. Although including sex as a covariate did not result in many changes in the linkage results, when age was included the results changed more markedly-specifically there were fewer significant linkage results when age was included as a covariate. It appears that many of these age effects can be accounted for by generational differences in gene expression. Inclusion of covariates in quantitative trait linkage analysis may improve power and reduce false positives.