Study subjects consisted of 194 individuals from 14 Centre d'Etude du Polymorphisme Humain (CEPH) Utah pedigrees, and 2882 autosomal and X-linked single-nucleotide polymorphism (SNP) genotypes were available from The SNP Consortium [2]. Quantitative phenotype data were generated from immortalized B cells, and 8793 gene expression values were available from the microarray raw CEL (cell intensity file) data files from the Affymetrix Genechips and Hgfocus CDF (chip description file) files. In cases in which a subject had more than one expression file, the first replicate was chosen for this analysis so that we used one array per subject. To evaluate the performance of different scenarios, two data sets were used: 1) RMA normalized using all individuals (arrays) as the normalization pool, and 2) RMA normalization applied to a within-pedigree pool to be able to allow for as many distributions as the number of pedigrees. The 'affy' package v1.8 in R v2.3.1 was used to perform the normalizations of the expression data [7].
In order to examine the effect of using different normalization pools consisting of different types of individuals, we performed paired t-tests using 8793 gene expression values of four founders of one family in two ways: comparing the values of the four founders after normalizing using themselves as the pool to paired gene expression values after normalizing 1) using all family members including themselves as the pool, and 2) using independent individuals as the pool (other grandparents from other families) [8]. The purpose of microarray normalization methods is to remove systematic variation while preserving biological variation. Therefore, comparison of the normalized gene expression data using the four founders as their own normalization pool to the normalized data using either of the other two pools (family members or independent individuals) should not show significant differences if the normalization pools are all only removing random error variation. Thus, we should observe non-significant p-values for these paired t-tests if only random variation is removed by each method. We assumed that these t-tests indicated a significant difference in the normalization methods if the p-value was less than a conservative p-value of 0.001 (since we were performing over 8,000 tests). However, we also evaluated this using a p-value of 0.05 as the significance threshold.
To examine the effect of normalization methods on linkage results, we selected 18 cis-acting transcriptional regulator phenotypes with previous evidence of cis-acting linkage to the known location of each corresponding structural gene [2]. Nonparametric quantitative linkage analysis was performed using Merlin v1.0 with the qtl option. We plotted both the negative p-values of the nonparametric linkage score [8] and the allele-sharing LOD score of Kong and Cox [9]. Based on the change of mean and variance of the trait values in each array, some individuals may have different trait values when using different normalization methods. FCOR in S.A.G.E v5.1.0 was used to calculate familial correlations (e.g., parent-offspring, sibling, and grandparent), which were compared for each trait across the different normalization methods.