The generalized linear mixed model for linkage
The GLMM models the probability of RA as a function of a linear predictor (sum of the covariate effects+random effect) via a logistic link. In the case of two siblings indexed by j = 1, 2, the model is as follows: given some observed covariates X
with population effect estimates β and some unobserved random effects b
, the disease status of the two siblings are independent random variables Y
= 0/1, with a distribution given by . Given the proportion of alleles shared IBD (π) of the sib pair at a putative location with linkage effect γ, the unobserved random variables b
are normally distributed with mean 0, variance σ2, and covariance σ2 × [ρ + γ(π - 0.5)] as in the usual variance components model for continuous traits. In principle, the model parameters β, σ2, and ρ require population data on sib pairs in order to be estimated. In fact, estimation is often difficult and we have proposed an approximate ad-hoc method that is fully described by Lebrec and van Houwelingen . We test for linkage, i.e., γ > 0 versus γ = 0 using a score test for this parameter in a pseudo-likelihood of the previously described GLMM, the result is a simple weighted average of the estimated IBD sharing between the two siblings in each family (indexed by i = 1,..., N)
The actual test is one-sided, so negative values of T are set to 0 and the resulting statistic T+ follows the usual 0.5χ0 + 0.5χ1 mixture under the null hypothesis. The weight w
given to a specific sibling pair depends not only on the segregation parameters β, σ2, and ρ but also on its covariate values. The weight w
in the linkage statistic given by Eq. (1) is as follows: in a sib pair with individuals indexed by j = 1, 2, let , and , the weight given to this sibling pair in the test statistic is given by .
Estimation of segregation parameters
Population data are required in order to obtain estimates of the segregation parameters β, σ2, and ρ. In fact, full maximum-likelihood estimation is often a difficult exercise for this type of data so we advocate the use of an approximate method of estimation which has been applied here . Based on an arbitrary value ρ = 0.8, we estimate β and σ2 so as to best reflect prevalence of RA in men and women (0.5% and 1.5%) as well as the overall recurrence risk ratio λ
= 6. The choice of those estimates does not affect the asymptotic validity (type I error) of the method, indeed the test statistic has mean 0 and variance 1 under the null hypothesis, and only slightly influences the efficiency of the test.
For anti-CCP levels, we first used the available measurements in NARAC to estimate the distribution of anti-CCP levels in RA patients (we chose one affected sib at random per family) and in healthy individuals (only 11 unrelated individuals) separately. The estimation was carried out separately in the two groups using a nonparametric method (Gaussian kernel). Based on an overall prevalence of 1% for RA, we used Bayes formula to derive the probability of an individual to be a RA patient conditional upon his/her anti-CCP level. Given the very limited data available in healthy patients, the resulting curve exhibited high instability, especially for high anti-CCP levels, so we eventually smoothed the results by fitting a logistic curve.