Incorporating prior biological information in linkage studies increases power and limits multiple testing.

We used the Genetic Analysis Workshop 15 Problem 1 data set to search for expression phenotype quantitative trait loci in a highly selected group of genes with a supposedly correlated role in the development of the enteric nervous system. Our strategy was to reduce the level of multiple testing by analyzing at the genome-wide level a limited number of genes considered to be the most promising enteric nervous system candidates on the basis of mouse expression data, and then extend the analysis to a larger number of traits only for a small number of candidate linked regions. Such a study design allowed us to identify a "master regulator" locus for several genes involved in the enteric nervous system, located in 9q31. In particular, one of four traits included in the genome-wide analysis and 2 of 57 from the follow-up single-chromosome analysis showed LOD scores above 2 around position 109 on chromosome 9 by univariate variance-component linkage analysis. Bivariate linkage analysis further supported the presence of a common regulatory locus, with a maximum multipoint LOD score of 5.17 and five additional LOD scores > 3 in the same region. This region is particularly interesting because a susceptibility locus for Hirschsprung disease, a disease characterized by enteric malformation, was previously mapped to 9q31. The proposed strategy of limiting the genome-wide analysis to a small number of well characterized candidate expression phenotypes and following up the most promising results in a larger number of correlated traits may prove successful for other groups of genes involved in a common pathway.


Background
The Genetic Analysis Workshop 15 (GAW15) Problem 1 data set is based on the proposed approach of considering natural gene expression variation as a quantitative trait influenced by genetic determinants. The resulting phenotypes can thus be used in linkage and association analyses to map loci involved in expression regulation in the genome [1]. This experimental plan proved to be successful on a genome-wide level, with the finding of extremely significant LOD score values for several genes and the presence of both cis and trans regulators as well as socalled "master regulators", or genomic regions where regulators for several genes are mapped. However, the extremely large number of phenotypes tested resulted in a severe multiple testing problem, and thus necessitated a rigorous correction with extremely small p-values to avoid inflation of type I error [1].
Here we carried out a genome-wide linkage analysis on a small number of traits, chosen a priori on the basis of a hypothesized common biological role. Prompted by our interest in Hirschsprung disease (HSCR), a congenital gut malformation characterized by absence of ganglia in the colon [2], we chose genes potentially involved in the enteric nervous system (ENS) development. The RET proto-oncogene, expressed throughout enteric neurogenesis, is required for normal ENS development and is the major HSCR gene. We based our trait selection on a recent article that described a microarray comparison between RNA from normal (wild types, WT) and Ret mutant (aganglionic) gut tissue in embryonic mouse and that has identified hundreds of candidate ENS-expressed genes [3].
The GAW15 data provided the expression phenotypes for the human orthologues of several of these genes. We conducted genome-wide variance-component linkage analysis on a highly selected group of four traits to search for candidate linked regions. We then extended the analysis to a larger number of traits focusing only on the region identified as most significant in the genome-wide screen, thus limiting the problem of multiple test significance. Because bivariate analysis has been shown to increase power to detect linkage of related traits to a common QTL, we used bivariate linkage analysis to test for the presence of genes with pleiotropic effects on the selected traits.

Traits selection
By using RNA from WT and aganglionic gut tissue and DNA microarrays, Heanue and Pachnis [3] conducted a differential screen for ENS-expressed genes and identified 327 overexpressed and 63 underexpressed genes in WT versus aganglionic intestinal samples. They independently verified the microarray results by RNA in situ hybridization for 47 genes selected for further analysis, representing diverse functional classes and either uncharacterized or partially characterized or known ENS marker genes [3]. The GAW15 data provided the expression phenotypes for the human orthologues of 67 of the whole set of 390 genes, including 5 that were part of the 47 selected as particularly promising candidate genes (corresponding to the expression traits 201387_s_at, 202154_x_at, 203440_at, 218501_at, and 209842_at). We elected to perform a genome-wide linkage analysis on these 5 genes only, and extend the analysis of any candidate regions thus identified to the remaining group of 62 genes.

Statistical analyses
Data from 194 individuals belonging to 14 three-generation CEPH (Centre d'Etude du Polymorphisme Humain) Utah families have been used for quantitative genetic analysis by means of the variance-component approach. Narrow sense heritability of the traits with its significance, and genetic and environmental correlations between pairs of traits were estimated using SOLAR version 2.1.4 http:// www.sfbr.org/solar [4]. Genotypes were checked for Mendelian errors by use of the GENEHUNTER software version 2.1_r5 beta [5] and markers with Mendelian inconsistencies in any given family were replaced by missing values through the entire three-generation family. The same was done in pedigrees in which an obligate recombinant was observed between markers located at the same position. Only autosomal chromosomes were included in the analysis.
Multipoint identity-by-descent (IBD) probabilities were estimated by GENEHUNTER and imported into SOLAR to estimate the genetic variance attributable to a hypothetical quantitative trait loci (QTL) linked to any given location. A test for linkage was carried out by testing whether the QTL additive variance was significantly different from 0 by comparing the likelihood of this model with that of a restricted model in which the genetic variance at the same location was fixed at 0. LOD scores reported in the text and the tables are calculated by SOLAR as the log 10 of the likelihood ratio of these two models.
Genome-wide univariate and bivariate linkage analyses were carried out on four highly selected traits presenting non-null heritability. To follow up on the genome-wide results, univariate and bivariate analyses were also performed on a larger number of selected phenotypes only for markers located on chromosome 9.
The marker maps used for each chromosome were obtained by a marker position file released with the data and considering 1 cM ≈ 1 Mb. The linkage analysis on chromosome 9 was carried out with both the physical map and the Rutgers genetic map http://actin.ucd.ie/cgibin/rs2cm.cgi.
No covariates were taken into account in the genomewide analysis, while analysis for chromosome 9 was also repeated including sex, age, age × sex, age 2 and age 2 × sex as covariates. Information about age was taken from the CEPH website http://ccr.coriell.org/nigms/ceph/ ceph.html.
The p-values of the univariate LOD scores were calculated by means of an empirical null distribution obtained by simulation of 10,000 replicates of an unlinked marker. Asymptotic p-values for the bivariate LOD scores were calculated based on a 1/4 :1/2 :1/4 distribution for the 2ln(10) LOD transformation [6]. No correction for multiple testing was applied to the p-values.

Results and discussion
All five traits initially selected showed residual kurtosis within normal range. Four traits presented non-null heritability (h 2 ) ( Table 1) and were selected for genome-wide variance-component linkage analysis. Trait 209842_at showed h 2 = 0 and was discarded. Genetic and environmental correlations between the four traits with non-0 heritability are also shown in Table 1. Genetic correlation was significantly different from 0 only for 201387_s_at and 202154_x_at. We observed the highest univariate LOD scores for trait 201387_s_at at position 109 on chromosome 9 (LOD score = 2.66, p-value = 0.0005), and for trait 202154_x_at at position 99 on chromosome 11 (LOD score = 2.74, p-value = 0.0007). All other LODscores were ≤ 2. There was no linkage evidence of cis regulators for any of the traits (Table 2).
LOD-scores from bivariate genome-wide linkage analysis were similar to those from the univariate analysis (Table  3). The maximum LOD-score was observed for traits 201387_s_at and 202154_x_at at position 109 on chromosome 9 (LOD score = 2.66; p-value = 0.0005), suggesting the presence of a common regulator for the two traits in this genomic region. Two other traits (203440_at and 218501_at) in combination with 201387_s_at yielded LOD scores > 2 at the same location; however the bivariate LOD scores were lower than the univariate LOD score observed for 201387_s_at alone. Two additional LOD scores > 2 were observed on chromosome 11 in bivariate analyses that included trait 202154_x_at, but were lower than the maximum univariate LOD-score observed for trait 202154_x_at alone.
To follow up on the most significant finding, we extended the analysis of chromosome 9 to the other traits potentially involved in the ENS development included in the data set. Among these, 5 presented null heritability and were excluded. Of the remaining 57 traits, 10 had maximum chromosome 9 LOD scores at position 107 or 109 (data not shown). In particular, 2 traits had LOD scores > 2 at these positions (201862_s_at: LOD score = 2.01 at 107, and 209034_at: LOD score = 3.65 at 109).
We obtained similar results using the Rutgers chromosome 9 genetic map (data not shown). For instance, trait 209034_at (LOD score = 3.65 at position 109 in our first analysis) was mapped at 140 cM with LOD score of 3.52, and trait 201387_s_at (LOD score = 2.66 at position 109) was mapped at 139 cM with LOD score of 2.60. The region from position 107 to 110 in the physical map that we initially used corresponds to the region from 130 to 151 cM in the Rutger's genetic map, and in particular position 109 corresponds to 139 cM.
The quantitative analysis was also repeated to estimate the effect of several covariates (namely sex, age, age × sex, age 2 and age 2 × sex) on variability of all the traits. Overall, at least one of the covariates showed significant effects for 42 of the 67 traits (p < 0.1). Age was more frequently found to have a significant effect on trait variation (35/42), while sex was rarely significant (3/42). After inclusion of the significant covariates, only slight differences were found in the results of the chromosome 9 linkage analyses. Two traits gave the most discordant results (difference in maximum LOD scores > 1): 209267_s_at, for which the highest LOD score at position 116 went from 2.12 to 1.06; and 212120_at, for which the LOD-score at position 11 went from 3.06 to 1.33. Results for the traits that showed LOD scores > 2 around position 109 were consistent with and without covariates.  We further estimated the genetic and environmental cor-  relations and carried out bivariate linkage analysis on eight traits with maximum chromosome 9 LOD score ≥ 1 and LOD-1 support interval that included position 109. Genetic correlations ranged from 1.47% to 88.45%, and high genetic correlation was not particularly predictive of an increase in the bivariate LOD scores compared to the univariate ones (Tables 2, 3, 4). All maximum bivariate LOD scores occurred between position 107 and 116, with the majority occurring at position 109. The maximum bivariate LOD score was 5.17 for traits 209034_at and 201387_s_at, and occurred at position 109. Five additional LOD scores were greater than 3 and all occurred at position 109 or 110, strongly suggesting the presence of a common regulator for these traits in this genomic position. The finding of a possible "master regulator" for several traits involved in the ENS development is particularly interesting because an as-yet undiscovered HSCR predisposing gene has been mapped by linkage analysis to the same genomic region on chromosome 9q31 [7]. HSCR is a congenital disease characterized by absence of ganglia in the colon. Genes involved in the ENS development and their regulators are therefore particularly interesting as potential HSCR susceptibility candidate genes.
In general, bivariate analyses were more significant than univariate analyses for several pairs of traits. Among the maximum LOD scores for all 28 pairs, 18 were equal or higher than both univariate ones, 8 were higher than 1 of the 2, and only 2 were lower than both. The 2 that were decreased both included trait 203787_at, whose maximum univariate LOD-score occurred at position 116, and two other traits whose maximum univariate LOD-scores occurred at position 107. Interestingly, the largest bivariate LOD-score was obtained for two traits with a negative genetic correlation and a small positive environmental correlation. Several studies have reported that the power of bivariate analysis is increased when the correlations induced by the QTL and by other sources of variation act in opposite directions [6,8,9]. However, we also found that the overall genetic correlation was a poor predictor of the results of bivariate analysis, with LOD scores > 3 resulting from the analyses of pairs of traits with small, nonsignificant correlations.

Conclusion
We have adopted a strategy that started from a small number of highly selected traits based on biological hypotheses to investigate the presence of linkage at a genome-wide level, and then extended the analysis to a higher number of traits only for the most promising region in the genome. Such an approach resulted in the identification of a genomic region potentially containing a common expression regulator for several genes involved in the ENS development, localized on chromosome 9q31. This region overlaps with the location of a putative susceptibility gene for HSCR, a disease characterized by enteric malformation. The inclusion or exclusion of several covariates or the use of a physical rather than a genetic map did not significantly affect our findings. Our results confirm the increased power of bivariate analysis to detect linkage of related phenotypes to a common QTL exploiting the additional information contained in the correlation pattern between the two quantitative traits.
The proposed strategy of limiting the genome-wide analysis to a small number of well characterized phenotypes and following up the most promising results in a larger number of correlated traits proved successful and could be used for the analysis of other groups of genes involved in a common pathway.