Statistical model
Let Y
ij
ndenote the expression of gene n in member j of family i and let G
ij
mbe the corresponding SNP genotype at marker m at location x
m
. For the means and covariances of the expression traits, we adopted a generalized estimating equations model of the form used by Thomas et al. [7]
E(Y
ij
n) ≡ μ
ij
n= α0n+ Σ
m
AnmG
ij
m (1)
E(C
ijk
n) ≡ χ
ijk
n= β0n+ BnZ
ijk
(Xn), (2)
where C
ijk
n= (Y
ij
n- μ
ij
n)(Y
ik
n- μ
ik
n) and Z
ijk
(x) are the estimated E(IBD
ijk
(x)|G
i
) at chromosomal location x for pairs (j, k) from nuclear family i, based on the complete multilocus marker data. Xnis a latent variable for location of the unobserved causal locus linked to expression trait n. For j = k, V(Y
ij
n) = χnmodels the gene expression variance in Eq. (2).
In Eq. (1), the regression coefficients Anmare modeled as a mixtures of null values with probabilities 1-πnmand a normal distribution of non-null values with means αnmexpressed in terms of row and column effects:
Anm~ (1 - πnm) δ (0) + πnmN(αnm, σ2), (3)
where
αnm= γ0A+ γ1AI(x
m
∈ R
n
) + e
m
A+ h
n
A (4a)
logit(πnm) = γ0P+ γ1PI(x
m
∈ R
n
) + e
m
P+ h
n
P. (4b)
The parameter γ1 distinguishes between cis and trans effects, a cis interaction occurs when the chromosomal location x
m
of SNP m is within the interval R
n
, the alignment region for the gene expression probe n. The random effects e and h are distributed as
(e
m
A, e
m
P) ~ N2(0, T) (5a)
(h
n
A, h
n
P) ~ N2(0, W) (5b)
and the γs, T, W have uninformative normal and Wishart priors.
The regression coefficients Bnin the covariance model in Eq. (2) are handled similarly, except that we assume each trait has at most one region linked to it. (This is not essential to the method, because Eq. (2) could be extended to a summation over multiple independent linkage regions, but it would not make sense to offer all marker locations simultaneously, since the IBD variables are highly correlated from one location to the next.) Thus, we assume
Bn~ N [γ0B+ γ1BI(Xn∈ R
n
), τ2] (6)
and pick a uniform prior on Xn; to simplify the calculations, we restrict Xnto the observed marker locations x
m
and compute IBD probabilities only at these locations. Thus, Xnhas a discrete distribution with prior masses inversely proportional to the local marker density, here estimated simply as |xm+1 - xm-1|. The full model is represented in the directed acyclic graph shown in Figure 1.
We fitted the model using a Markov-chain Monte Carlo (MCMC) approach, implemented in Matlab. Updates of all parameters except the location parameters Xnvalues involve standard Gibbs sampling from their respective full conditional distributions, e.g., [α0n|Yn, G, A], [β0n|Cn, Z, B], [Anm|Yn, Gm, α0n, πnm, αnm, σ2], etc. The updates of the X values are based on a Metropolis-Hastings procedure with a random walk proposal. The sequence was started ten times from several initial points chosen from an overdispersed prior around rough estimates. Half of the initial samples are discarded and the second half is kept. The number of kept samples, L = 4000, is chosen to be large enough so that for all parameters of interest the variance between sequences V
B
is comparable to that within sequence V
W
, R < 1.10:
(7)
The rationale behind this convergence monitoring procedure is described and justified by Gelman et al. [8].
Subjects, genotypes, and phenotypes
In order to keep the computation to a manageable level, we restricted this analysis to the SNP genotypes and expressed genes on chromosome 11, since previous analyses by Morley et al. [2] had found evidence of both cis and trans linkages at this chromosome. The final data set thus had 116 SNPs and 189 expressed genes. IBD status was estimated from the complete two-generation pedigrees (excluding grandparents) by a program written by JM based on the Lander-Green algorithm [9]. All 378 sib pairs (110 individuals) from the available 14 families were included in the phenotype analysis.