Material
The segregation of 730 microsatellite markers, spaced on 22 chromosomes with an average inter-marker distance of about 5 cM, was simulated on 100 replicates of 1500 families with at least two affected sibs.
Preliminary linkage analyses showed that it was not possible to make any power comparison with such sample sizes: all linkage statistics were highly significant for detecting the role of HLA, while their power was very low (less than 5%) for the other loci. Therefore, we decided to focus on the detection of the susceptibility factor in the HLA region and to split each replicate into smaller family samples in order to have a lower, but not too low, power of detection. A sample size of 60 seemed appropriate. Each replicate was split into 25 sub-samples. The study was thus performed on 2500 replicates of 60 families each. Parental status was considered unknown in all replicates, so that linkage information consists of the identity-by-descent (IBD) sharing between affected sibs.
Linkage statistics
The data were analyzed by four LOD score statistics, MLS, KC-LOD, HLOD-S1, and HLOD-S2.
The MLS [3] maximizes the likelihood of the IBD sharing vector, within the possible triangle constraints [10]. Under the null hypothesis, the expected IBD vector is [0.25; 0.50; 0.25]. Calculations were performed with the Mapmaker/Sibs software [11].
The KC-LOD proposed by Kong and Cox [1] is maximized on a single parameter, δ, that represents the degree of allele sharing among affected individuals. Under the null hypothesis, δ is equal to 0, and the higher δ, the higher the allele sharing. KC-LOD analysis was carried out under the "score pairs" option and the exponential model proposed by Kong and Cox with Allegro v1.2 [12].
HLOD-S1 was calculated as initially proposed by Greenberg et al. [4] under a dominant and recessive model, each with a disease allele frequency of 0.01, a penetrance of 0.50, and no phenocopies. The LOD score function was maximized over these two models and the heterogeneity parameter, α, represented the proportion of families linked to the disease locus. In HLOD-S2, two additional models were considered, with a disease allele frequency of 0.2. The LOD score function was then maximized over these four genetic models and the parameter α. All HLOD calculations were done with the Allegro v1.2 software [12].
We first studied the distribution of these four statistics under the assumption of no linkage by analyzing the 16 chromosomes that did not harbor a susceptibility gene. The maximum value on each chromosome was recorded for each statistic, leading to 40,000 values (2500 replicates × 16 chromosomes). This provides the distribution of the maximum of each statistics for an average chromosome. Thus, for a full genome scan, one may apply a Bonferroni correction for 22 chromosomes. This procedure can be used either to determine the threshold for a genome-wide type I error of 5% (nominal p = 0.002 per chromosome) or to determine the genome-wide p-value corresponding to a given value of the statistics.
The power for detecting linkage was calculated as the number of times a given statistic exceeded the threshold corresponding to a genome-wide type I error of 5%. Two loci in the HLA region were known to be involved in the manifestation of the simulated disease. We considered the HLA region to be detected if there was evidence for linkage in the 20-cM interval around the HLA-DR locus, i.e., in the interval [STRP6_10-STRP6_13].