Data description
For our analyses, we used data on RF-IgM levels and genome-wide information on 730 microsatellite marker loci distributed over the 22 autosomal chromosomes. Our method utilizes marker genotype data on 1500 independent sib pairs and their parents for identity-by-descent (IBD) computations. The nonparametric regressions for the linkage scan are based on the RF-IgM and IBD data. We performed our analyses on all 100 available replicates.
Statistical methodology
Suppose y
ij
denotes the RF-IgM of the jth sib in the ith family, i = 1, 2,..., 1500; j = 1, 2; and denotes the estimated IBD score for the ith sib pair at an arbitrary point p on the genome. We define U
i
= (yi1 - yi2)2 and V
i
= (yi1 + yi2)2. The classical Haseman-Elston method [10] and its extensions [6, 7], which involve a linear regression of squared differences (or suitable alternative functions) of sib-pair trait values (U
i
values) on estimated marker IBD scores ( values) are adversely affected by the increase in dominance at the QTL. Thus, a more robust strategy is to estimate empirically the nature of the functional relationship between the two variables.
Following Ghosh and Majumder [3], we assume a nonparametric regression model:
U
i
= ψ() + e
i
; i = 1, 2,..., 1500,
where ψ is a real valued function and e
i
values are random errors. The functional form of ψ is estimated using a kernel smoothing technique [6] with kernel function:
k(x) = 3/4(1 - x2), |x| < 1;
0, otherwise.
Ghosh and Majumder [3] had used the Nadaraya-Watson estimator for the prediction of U
i
values. There is now increasing evidence that local polynomials have lower prediction errors [6, 7] than the Nadaraya-Watson estimator. We used a local linear polynomial to predict U
i
as follows: ,
where h is the "optimal" window length in the kernel smoothing procedure obtained using "leave-one-out" cross-validation; and β0 and β1 are the weighted least squares estimators of the local linear regression of U
j
on with weights as
To assess the significance of our regression, we used a diagnostic measure [11]. We note that the proposed measure Δ is an analog of R2, the square of the correlation coefficient between the response variable and the explanatory variable, which is used in linear regression as a measure of the proportion of variance of the response variable explained by the explanatory variable. One can evaluate the significance of the observed Δ empirically by generating random IBD scores under the null hypothesis of no linkage, while preserving the actual RF-IgM values.
There have been suggestions that using squared differences in conjunction with squared sums of sib-pair trait values may be a more powerful linkage strategy compared to using squared differences only [6, 7]. In order to explore this hypothesis, we developed a nonparametric regression strategy combining the two phenotypic functions. For this purpose, we performed an additional nonparametric regression of V
i
values on values using the local linear polynomial estimator as described earlier. In this case, our diagnostic Δ is defined as , where ψ1 and ψ2 are the unknown regression functions of corresponding to U
i
and V
i
, respectively.
Because the proposed Δ statistic does not consider the direction of the relationship between the squared sib-pair trait difference and the estimated marker IBD scores, there may be concern of an inflated false-positive error rate due to a random negative relationship between the variables under the null hypothesis of no linkage. To circumvent this problem, we ensured that the correlation between the variables is negative for each of the marker positions showing significant evidence of linkage. When we considered the squared differences in conjunction with the squared sums, we additionally verified that the correlation between the squared sums and the estimated IBD score is positive at each of the significant markers.