Family data
Analyses were performed on only the first 500 families from the first 50 replicates to reduce PPL computation time because using the full set of data and number of replications was wasteful of resources for our purposes here (At the time of the initial draft of the paper, it would take up to a couple of weeks to complete one replicate. Since then, the program has improved greatly and reduced the time to less than a day [6].). Based on inspection of the answer file, we selected marker STRP16_6 for dichotomous trait (rheumatoid arthritis, RA) linkage analysis. This marker is at 27.44 cM on chromosome 16, 1.15 cM away from Locus A. Because Locus A and DR interact to increase RA risk (see the "Risk Multipliers" table in the answer file), we then used the genotypes at DR to classify individuals into liability classes. For computational convenience [6], we restricted attention to just two liability classes: LC1 comprised individuals with two DR4 alleles (the high risk group); LC2 comprised the remaining individuals (a low risk group).
For quantitative trait (QT) analysis, the phenotype anti-cyclic citrullinated peptide antibody (anti-CCP) was chosen, and Locus E was evaluated for linkage. Again, the generating model included an interactive effect of DR and Locus E on anti-CCP levels. Anti-CCP measures were standardized on the basis of all available parental values; no other changes to the phenotypes were made. Linkage analysis was applied to marker STRP18_22, located at 92.9 cM on chromosome 18, which is 1.4 cM away from Locus E. As above, any individual with two copies of the high-risk DR4 allele was coded as being in LC1, all other individuals were coded to be in LC2.
Statistical analysis
The PPL is on the probability scale, can readily incorporate prior information, and is particularly suited to the accumulation of evidence across multiple, potentially heterogeneous, data sets [7–9]. The unknown trait model is treated as a vector of nuisance parameters, and integrated out of the constituent likelihoods [10, 11]; thus the method is essentially model-free, while retaining the strengths of likelihood-based analysis. Further, in application to quantitative traits, this framework does not assume normality at the population level or require population parameter estimates [12, 13] in order to address ascertainment.
As described in detail elsewhere [10, 11], the PPL can be computed from an ordinary LOD score, with the unknown parameters of the trait model integrated out rather than fixed at arbitrary values. The PPL can therefore in principle be extended to incorporate any form of likelihood for which LODs can be calculated. Thusfar we have extended the original dichotomous trait PPL [7, 14], which already allowed for locus heterogeneity under the admixture model [15], to include allowance for linkage disequilibrium [16], sex-specific recombination [17, 18], quantitative traits [13], combined quantitative/dichotomous traits (within the same pedigree) [12], implemented in both two-point and and/or multipoint forms [19].
The standard dichotomous trait PPL is parameterized in terms of the (sex-averaged) recombination fraction, the admixture parameter, a disease allele frequency, and three penetrances (one for each genotype, assuming a two-allele locus). The standard quantitative trait (QT) PPL is parameterized similarly, except that instead of three penetrances, the likelihood is written as a function of three genotypic means and three genotypic variances [12, 13]. (In the present application we have set the three variances equal to one another.) As elsewhere, we assume a 2% prior probability of linkage [20]. Thus PPLs > 2% represent (some level of) evidence in favor of linkage; while PPLs < 2% represent (some level of) evidence against linkage. The PPL is on the probability scale, and is therefore bounded by [0, 1]. For comparison purposes, we also report MODs [21], which are LODs parameterized identically to the PPLs, then maximized over all parameters in the model (whereas the PPL is integrated over these same parameters).
Here we extend the PPL once again, to allow different penetrances for individuals in different LCs. The new extension of the PPL allows covariate-dependent penetrances. Specifically, we assign individuals to liability classes (LCs) based on covariate status. In the present application, we use this parameterization to condition on the causal genotype at the known risk locus DR; however, the same model could be used to condition on other covariates, such as age or sex [22]. We then include a separate penetrance vector in the likelihood for each LC in the model. These penetrance vectors are then integrated over, rather than fixed, to obtain a marginal posterior probability. In the dichotomous trait analyses, we have constrained the A-locus penetrances for individuals in LC1 to be greater than or equal to the corresponding penetrances in LC2, for each A-locus genotype, respectively. In the QT analyses, we have constrained the genotypic means for individuals in LC1 to be greater than or equal to means for individuals LC2, again, for each E-locus genotype, respectively. We have recently implemented a suite of PPL statistics in a new package, KELVIN, designed for distributed parallel computation over the parameter space [6, 23]. KELVIN is based on a re-engineered version of VITESSE [24, 25], thusfar incorporating two-point and multipoint linkage analysis of dichotomous and/or quantitative traits, marker-trait linkage disequilibrium, and LCs. Exportable software is currently under development. Unsupported and platform specific version can be made available by contacting the corresponding author.