The GAW15 Problem 1 (P1) data set was used by this study [11]. Immortalized B cell gene expression data of 8793 probe sets (probes) from each of 276 GeneChip® Human Genome Focus Arrays was available for 193 individuals (56 founders) of 14 three-generation Centre d'Etude du Polymorphisme Humain families. Quantitative trait phenotypes derived from the 3554 probes with the most variable expression phenotypes identified by Morley et al. (P1QP) were also provided for 194 individuals.
RMA, GCRMA, MAS5, and two dChip gene expression values (DCHIPPM: only perfect-match probe data from each array was used for background correction; DCHIPMM: mismatch probe data was subtracted from perfect-match data during background correction) were estimated and log2-transformed with Bioconductor [12]. The duplicate arrays that were available for some individuals (n = 82) were not used.
As described by McClintick et al. [13], three probes with very strong sex-specific expression were found: 214218_s_at (female), 205000_at and 206700_s_at (male). Their expression values were in disagreement with the specified genders of individuals 1421–8 (male) and 1421–14 (female). The arrays and P1QP of those two individuals were excluded from analyses.
Heritabilities of all P1QP traits were estimated with SOLAR [14], using the tdist adjustment, which allows for the robust estimation of the mean and variance from a trait when its distribution deviates from multivariate normality. The significance of each heritability estimate was then subjected to the family-wise type I error rate (FWER) adjustment of Sidak [15]. All probes from P1QP with heritabilities ≥ 0.5 and a FWER p-value ≤ 0.05 were selected as the quantitative traits for linkage.
The expression values of the selected traits defined the RMAQP, GCRMAQP, MAS5QP, DCHIPPMQP, and DCHIPMMQP phenotype sets according to the expression measure from which they originated. The phenotypes of the selected traits from P1QP were used as a baseline for heritability and LOD score comparisons (REFQP). Note that REFQP and MAS5QP should be highly correlated because both were derived from MAS5 expression values.
In addition, a "false-positive set" of phenotypes was derived from the phenotype sets described above (the "real-linkage set"). In order to preserve the heritability structure of the real phenotype sets, and since Hinrichs et al. [16] found a high intraclass correlation between the sib phenotypes, phenotypes were randomly swapped between whole families by shuffling the family identifiers of each individual, keeping the same generational hierarchy intact.
A genetic map for linkage analyses was derived from the P1 physical map using the single-nucleotide polymorphism (SNP) Mapping web application at the University College of Dublin (UCD) Conway Institute of Biomolecular and Biomedical Research [17]; eight markers not mapped by it were linearly interpolated. Mendelian inheritance inconsistencies and double-recombinant genotypes were blanked from the P1 SNP genotypes according to mistyping probabilities from Simwalk [18]. Multipoint IBD matrices for all 2882 autosomal and X-linked SNP markers were constructed with Merlin [19] and Merlin's minx.
Heritability and linkage analyses of the selected quantitative traits and phenotypes of all 14 families were performed with SOLAR, using the tdist adjustment and sex as the only covariable. LOD scores were calculated at 5-cM intervals along the 22 autosomes and the X chromosome, and at 1-cM intervals around signals ≥ 2.
We focused our observations on LOD scores equal to or above two different thresholds: three and five. In addition, because there is likely to be a correlation between the magnitude of closely located high LOD scores, we summarized them as a signal (a QTL), defined by the highest local maximum over the range of contiguous LOD scores that passed the threshold criteria. This gave us the highest peaks of the local LOD score curve, and its width (in centimorgans) at the threshold level. In this way, we expected to reduce the number of possibly redundant linkage signals in a region to a few. We arbitrarily defined cis linkages as those signals that included the location of the trait's probe (regardless of the length of the region). For trans linkages, the signals did not include the location of the trait's probe. Note that this is a different definition than the one used by Morley et al.