Simulations to test the impact of ENQT on power and type I error
The parental trait is determined by H(Y
ij
) where
Y
ij
= β1X1ij+ β2X2ij+ g
ij
+ G
ij
+ e
ij
is the original trait value of individual j in family i. H(Y) = e1+y+ (5 + y)2 transforms Y
ij
to a distribution with an average kurtosis of 54.1 and skewness of 4.98 if Y
ij
is normal N(0, 1.5). X1ijand X2ijare fixed covariates mimicking standardized age (N(0, 1)) and sex (male or female with equal probability) with β1 = -0.5 and β1 = 0.5. g
ij
is the major gene effect determined by the true QTL, which assumes value -a, 0, or a for genotype AA, Aa, or aa, respectively. The major genetic variance is therefore σ
g
2 = 2pqa2 = . G
ij
is the polygenic effect that follows a normal distribution with mean 0 and variance σ
G
2. e
ij
is a normal random environmental effect with mean of 0 and variance of σ
e
2. The genetic heritability h2 and major gene heritability h
g
2 are calculated as h2 = (σ
g
2 + σ
G
2)/σ2 and h
g
2 = σ
g
2/σ2, respectively, where σ2 = σ
g
2 + σ
G
2 + σ
e
2 is the total sample variance. The trait of offspring is determined in a similar way but the offspring's polygenic effects are determined by , where G
ij
Pand G
ij
Mare the paternal and maternal polygenic effects of the parents, respectively.
We simulated the same six schemes as those in Diao and Lin [4]. Namely, we set σ
g
2, σ
G
2, and σ
e
2 to (0, 1, 1), (0.2, 0.8, 1), (0.4, 0.6, 1), (0, 0.6, 1.4), (0.2, 0.4, 1.4), and (0.4, 0.2, 1.4) for schemes a through f, respectively. Among these schemes, schemes a and d serve as null hypotheses because their major gene heritabilities are 0. For each setting, we generated 20,000 data sets. The variance-components method was applied to original (H(Y
ij
)), perfectly back-transformed (Y
ij
), and ENQT-transformed trait values. The SQTL method was also applied to the original trait values. The percentage of simulations with p-values less than 5%, 1%, and 0.1% are reported.
Application to Problem 1 of GAW15
We took the expression data of Problem 1 of GAW15 and transformed each trait by ENQT. The resulting traits are normal with high p-values (>0.99) in normality tests. Besides descriptive statistics (mean, variance, skewness, and kurtosis), we applied the Anderson-Darling normality test and variance-components method to estimate polygenic heritability. Using these initial statistics, we chose several groups of traits that are:
1. Normally distributed (p-value of Anderson-Darling normality test >0.7) with before-transformation heritability >0.3. This group has 81 traits.
2. Significantly non-normally distributed with p-value of Anderson-Darling normality test <0.0001 and with before-transformation heritability >0.4. This group has 43 traits.
3. Having high heritability (>0.6) before transformation. This group has 37 traits.
4. Having a high difference in heritability before and after transformation (>0.1). This group has 49 traits.
5. Having low difference of heritability (<0.001), with before-transformation heritability >0.3. This group has 49 traits.
We use heritability as a criterion because traits with low heritability may not be of interest. These groups sometimes overlap. For example, there are 16 common traits in the non-normal and high heritability groups, indicating potential exaggeration of the estimates of heritability due to non-normality.
For traits in these groups, we performed and compared full genome-wide scanning using variance component [1] and variance regression [2] methods, and compared the LOD scores at the SNP markers before and after transformation.