New score statistic
Let G measure the total effect of the unobserved genes under the identified linkage peak. Without loss of generality, we assume that G has mean value of zero. Further, we assume that the disease is rare. The relative effect RR of G is modelled by RR = exp(G). By using second-order Taylor approximations around G = 0, the conditional probability of being affected given G is proportional to (1 + G + 0.5G2) and the probability that two siblings are affected given their genetic effects G1 and G2 is proportional to (1 + G1 + G2 + 0.5 G12 + 0.5 G22 + G1G2).
The information available on G is for each sibling pair, the pair of genotypes at the candidate SNP of interest (S1, S2) and the IBD status at the candidate locus IBDS. The conditional probability that two siblings are affected given (S1, S2) and IBDS is proportional to (1 + E(G1|S1) + E(G2|S2) + 0.5 Var(G1 + G2|S1, S2, IBDS)). By applying Bayes rule and assuming Var(G1 + G2|S1, S2, IBDS) = Var(G1 + G2|IBDS), we obtain
with μ equal to E(S1). By using Var(G1 + G2|S1, S2, IBDS) = Var(G1 + G2|IBDS), we assume that given the IBD status, the variance and covariance of the genetic effects do not depend on the SNP genotypes. The parameter δ measures linkage at the SNP location and the parameter β measures association of the SNP to the disease. The model extends the model of Kong and Cox [4] by also including an association term.
Now the log likelihood function is given by
where c is a constant independent of β. The corresponding score statistic U to test the null hypothesis H0: β = 0 given IBDS is given by
The parameter δ can be obtained by applying Kong and Cox method [4] to the ASP, and the parameter μ can be obtained from the controls. Under the null hypothesis the statistic U has mean value of zero. The variance of U can be empirically estimated or computed based on the genotype frequencies under the null hypothesis.
The unstratified version of this statistic tests the null hypothesis of no association without accounting for linkage information. Also, for this statistic the genotypes of sibling pairs are not independent. The variance of the statistic can be empirically estimated. If the assumption of Var(G1 + G2|S1, S2, IBDS) = Var(G1 + G2|IBDS) is violated the stratification according to the IBD groups will not be optimal. The test statistic is still valid, but the gain in power compared to the unstratified test statistic will be smaller. Therefore, we propose also the statistic U*, which combines the unstratified statistic and the new score statistic by pooling the one and two IBD groups.
Materials
To evaluate the performance of the new statistics U and U*, we analyzed the sibling pairs affected with RA of Replicates 1 to 100. For the simulation of the replicates, a lifetime prevalence of RA of 0.0107 and a lambda-sib (lifetime prevalence for siblings of affected individuals divided by the population prevalence) of 9.03 was used. The marker information was high due to a dense map (average spacing of 5 cM), a high marker heterozygositiy (above 0.7), and the availability of parental genotypes. For association, we studied the DR locus with two risk variants (DR1 and DR4). The DR1 allele has a frequency of 0.1 and a genotype relative risk for homozygous carriers versus homozygous carriers of the DRx allele of 1.5. The other variant DR4 has a frequency of 0.25 and increases the risk for RA enormously. The genotype relative risk for homozygous carriers DR4 versus homozygous carriers of the DRx allele is 30. Our aim was to identify the DR1 allele at the DR locus.
We first removed the sibling pairs and controls who are homozygous carriers of the DR4 allele. The number of affected sibling pairs used for analysis varied from 211 to 270 with a mean number of 238 sibling pairs. The number of homozygous carriers in the controls was small, and around 2000 controls were available for association analysis. For each replicate we used Merlin [5] to estimate the parameter δ and to compute the multipoint IBDS at the DR locus, assuming an additive model for each sibling pair. For almost all sibling pairs, the IBD status was observed. When the IBD status was uncertain, the most likely IBD status was assigned to the sibling pair. In the 100 replicates, the parameter δ varied from 0.13 to 0.36 with a mean of 0.25. The genotype S
k
for k = 1 or 2 was defined as the number of DR1 alleles carried by sibling k and its expectation μ was computed from the controls. In the replicates, the parameter μ varied from 0.28 to 0.37 with a mean of 0.32.
We applied the new score statistic U, the unstratified test statistic and the statistic U*, which combines the sibling pairs who share two alleles IBD and one allele IBD. We estimated the variances of the three statistics empirically. Finally, for the method of Li et al. [2] we used the program LAMP assuming an additive model and a disease prevalence of 0.01. We considered both the ASP design as well as the ASP-control design. Note that we did not include the uncertainty of the parameter μ in the computation of the p-value for the score statistics. To be able to compare the performance of the score statistics with the performance of the Li method, we made the uncertainty in the estimated allele frequencies in the controls negligible by multiplying each control record four times.