### Notation and assumptions

The study sample contains *n*_{1} cases and *n*_{0} controls (*n*_{1} + *n*_{0} = *n*) with corresponding allele frequencies *p*_{1} and *p*_{0} and common frequency *p* under the null hypothesis of no association. There are *m* cases from sibships with at least two sibs and *n*_{1} - *m* independent cases. At the candidate locus, each individual has two alleles, *X*_{i1 }and *X*_{i2 }(*i* = 1,..,*n*) coded as 0/1. Usually only the genotype *X*_{i. }= *X*_{i1 }+ *X*_{i2 }is known. For all individuals the affection status *y*_{
i
}= 0/1 is given. The cases from families comprise *k* = 1,...,*K* sibships of size *m*_{
k
}, and *z*_{
i
}denotes the sibship of individual *i*. For the cases, the *X*_{
ij
}values have a Bernoulli(*p*_{1}) distribution. Cases from different sibships are assumed to be independent, cases from the same sibship are not independent. To describe the correlation structure between sibs we use a model from population genetics that considers a population consisting of different subpopulations based on the coefficient *F*_{
ST
}and the inbreeding coefficient *F*_{
IT
}. Sibships are regarded as small subpopulations and *F*_{
ST
}denotes the correlation between two randomly chosen alleles of two individuals from the same sibship. Under the assumption of no population structure, correlations within sibships only arise from IBD sharing between sibs and *F*_{
ST
}equals the expected kinship coefficient between two siblings. *F*_{
IT
}measures the correlation of the two alleles within an individual and equals 0 under assumption of random mating and no further population structure.

### The test statistic

Based on the correlations *F*_{
ST
}and *F*_{
IT
}, the true variance of the numerator of the allelic *χ*^{2}-test statistic can be calculated. One component is the sum of all alleles from cases of sibships S={\displaystyle \sum {\phantom{\rule{0.5em}{0ex}}}_{i:{y}_{i}=1,{z}_{i}\in 1,\mathrm{...},K}}{X}_{i.}. Its true variance can be calculated as

Var(S)={p}_{1}(1-{p}_{1})2m[1+{F}_{IT}+2{F}_{ST}(({\displaystyle \sum {\phantom{\rule{0.5em}{0ex}}}_{k}{m}_{k}^{2}}/m)-1)],

where the term in square brackets, in the following denoted by *γ*, is the variance inflation in comparison to the variance of the sum of alleles from independent cases. If the data set only consists of affected sib pairs, the inflation factor simplifies to *γ* = 1 + *F*_{
IT
}+ 2*F*_{
ST
}. The total numerator can be expressed as the estimated allele frequency difference between cases and controls

T=1/(2{n}_{\mathrm{1)}}{\displaystyle \sum {\phantom{\rule{0.5em}{0ex}}}_{i:{y}_{i}=1}{X}_{i.}}-1/(2{n}_{0}){\displaystyle \sum {\phantom{\rule{0.5em}{0ex}}}_{i:{y}_{i}=0}{X}_{i.}}.

Under the null hypothesis of no association, its variance can be derived by dividing the sum of alleles within cases into two parts: one for affected sib pairs and one for independent cases, leading to

Va{r}_{\gamma}(T)=p(1-p)((m\gamma +{n}_{1}-m)/\mathrm{(2}{n}_{1}^{2})+1/(2{n}_{0})).

The inflation *γ* for the allelic *χ*^{2}-test *Var*_{γ=1}(*T*) is defined as *λ* = *Var*_{
γ
}*T*/*Var*_{γ=1}*T*.

### Strategies to determine the correlations *F*_{
ST
}and *F*_{
IT
}

To estimate *Var*(*T)*, different strategies for determining *F*_{
ST
}and *F*_{
IT
}were investigated. In strategy I ("no linkage") *F*_{
ST
}is directly calculated under the assumptions of no linkage and *F*_{
IT
}= 0. Here *F*_{
ST
}corresponds to the prior kinship coefficient of a sib pair. *F*_{
ST
}= 0.25, since 2*F*_{
ST
}is the probability that two alleles from the same parent of a sib pair are IBD. In the two other strategies *F*_{
ST
}is estimated to account for regions of linkage where the true *F*_{
ST
}is larger than 0.25.

In strategy II ("ANOVA") *F*_{
ST
}and *F*_{
IT
}are estimated by analysis of variance based on the marker data of the affected sibships at the candidate locus [7]. This strategy has no further assumptions and is based on a partitioning of the total sum of squares into three sums of squares: within individuals, within sibships, and between sibships. Each of them describes the additional variance compared to the lower level in the given order. Because *F*_{
ST
}and *F*_{
IT
}can be expressed as ratios of variance components, estimates for *F*_{
ST
}and *F*_{
IT
}can be derived as functions of the sums of squares.

Strategy III ("MULTI") uses a multipoint *F*_{
ST
}estimate assuming *F*_{
IT
}= 0, requiring genotype information at adjacent markers, e.g., for cases previously analyzed for linkage with these markers. *F*_{
ST
}can be directly estimated from the estimated mean number *Y* of alleles IBD within the affected sib pairs. The expectation of *Y* can be expressed as *E(Y) = 2N·2F*_{
ST
}, where 2N={\displaystyle \sum {\phantom{\rule{0.5em}{0ex}}}_{k=1}^{K}{m}_{k}({m}_{k}-1)} is the total number of allelic pairs considered and *2F*_{
ST
}, is the probability that such an allele pair is IBD. The estimated number Y of alleles IBD has to be calculated from individual IBD estimates. If there are only affected sib pairs in the data (*N* = *K*), *Y* can be derived from the nonparametric linkage-score (NPL- or Z-score), which is then equivalent to the classical mean test statistic Z=(Y-K)/\sqrt{K/2}. Here the same IBD measure is used as in linkage analysis.

To evaluate the strategies we implemented the test statistic in the computer program R. For strategy I *F*_{
ST
}= 0.25, for strategy II *F*_{
ST
}was estimated in the ANOVA framework implemented in R, and for strategy III we calculated NPL-scores with Merlin.

### Application to data from a candidate gene study for rheumatoid arthritis

The proposed methods were applied to case-control data from 20 candidate genes for rheumatoid arthritis previously analyzed by Plenge et al. [6]. The 839 cases were from the North American Rheumatoid Arthritis Consortium (NARAC) and include 717 cases from affected sibships and 122 unrelated cases. The 855 unrelated controls were selected from healthy individuals who were enrolled in the New York Cancer Project (NYCP). Because we have to include additional data for strategy III, we only investigated the introduced test statistics based on strategy I, II, and the traditional allelic *χ*^{2}-test based on allele frequencies ignoring familial correlations. We compared our results to Plenge et al. [6] who analyzed the same sample with only a few additional individuals.

### Application to the simulated data

Additionally, the SNP genome data from Replicate 1 of the simulated Genetic Analysis Workshop 15 data were analyzed knowing the solutions. The data contain 1500 families of two parents and an affected sib pair and 2000 controls. We calculated our test statistics based on strategies I-III for all 9187 SNPs of the genome scan comparing 3000 cases to 2000 controls. Subsequently, in order to remove true associations, we excluded SNPs in a region around ±3 cM of simulated disease loci to analyze data simulated under the null hypothesis of no association but allowing for linkage. For the remaining SNPs we verified the type I error rate of the test statistics. We also analyzed chromosome 6 containing the major disease locus to concentrate the analysis on a region of known linkage.