Exploration and comparison of methods for combining population- and family-based genetic association using the Genetic Analysis Workshop 17 mini-exome

We examine the performance of various methods for combining family- and population-based genetic association data. Several approaches have been proposed for situations in which information is collected from both a subset of unrelated subjects and a subset of family members. Analyzing these samples separately is known to be inefficient, and it is important to determine the scenarios for which differing methods perform well. Others have investigated this question; however, no extensive simulations have been conducted, nor have these methods been applied to mini-exome-style data such as that provided by Genetic Analysis Workshop 17. We quantify the empirical power and false-positive rates for three existing methods applied to the Genetic Analysis Workshop 17 mini-exome data and compare relative performance. We use knowledge of the underlying data simulation model to make these assessments.


Background
Study designs for genetic association studies fall into two broad categories: (1) population-based studies that recruit unrelated individuals and (2) family-based studies that collect some number of related pedigrees. Often, both study designs are used for a particular investigation. For example, when a linkage study has been performed and family data are collected, follow-up analysis can include association using a new unrelated study population. The analytic methods appropriate for either design differ, thus making difficult the aggregation of the association metrics across the study designs. Heuristically, population-based metrics attempt to quantify a measure of correlation or association between some function of genotype at a given marker and the disease phenotype, whereas family-based association measures use properties of Mendelian transmissions from parents to offspring and are inherently conditional.
Because analyzing the disparate types of data in isolation most often results in nonoptimal statistical power, investigators have proposed several methods for efficiently combining these data. We briefly summarize three methods to be applied to the Genetic Analysis Workshop 17 (GAW17) data in the Methods section. Each approach is distinguished by the study designs for which it is appropriate, the assumptions necessary for valid inference, and the handling of population stratification (whether it is formally or informally tested or whether it is taken into account by means of adjustments). Operationally, these methods are distinguishable by computation and implementation considerations and by empirical performance. We assess the performance in this paper. Other researchers have investigated the question of relative performance [1]; however, no simulations have been conducted for comparison.
An important consideration to keep in mind throughout this investigation is the underlying causal model that was used to generate the GAW17 data [2]. First, rather than reflecting the common disease/common variant hypothesis that the established methods presented address, the data-generating mechanism used was consistent with the multiple rare variant or the common disease/rare variant (CDRV) hypothesis, which suggests that common disease susceptibility is garnered through multiple rare variants with moderate to high penetrance. Intuitively, the current methods do not perform well in identifying rare single-nucleotide polymorphisms (SNPs); in this paper we intend to assess this performance and to motivate possible modifications that would be successful when the CDRV hypothesis is true. In addition, the disease was simulated to have ≫ 30% prevalence, which violates the often-invoked rare disease assumption.

Methods
The first attempts to combine population-and familybased association data were developed by Nagelkerke et al. [3], who used a likelihood framework to combine casecontrol data with family data by exploiting the likelihood formulation [4] of the transmission disequilibrium test (TDT) [5]. This approach assumes Hardy-Weinberg equilibrium (HWE), random mating, and a multiplicative model of allelic effect. Although no formal test of the appropriateness of combining the two types of data has been developed, we discuss ad hoc procedures.
Epstein et al. [6] generalized this work by relaxing the assumptions of HWE, random mating, and the assumed multiplicative mode of inheritance. In addition, they described a formal test for the appropriateness of combining case-control and case-trio data by comparing genotype relative risk (RR) estimates from between-individual and within-family analyses, respectively. The proposed twostage procedure facilitates valid model selection in the presence of population stratification. Further extensions of this approach were made by Chen and Lin [7]. Their method uses weighted least squares to aggregate the disparate RRs and requires no assumptions for mating-type distributions.
Epstein et al.'s and Chen and Lin's methods rely on two strong assumptions: a rare disease and the absence of population stratification. Later work has been targeted at both relaxing the rare disease assumption and adjusting for population stratification. Zhu et al. [8] used a principal components strategy to adjust for population stratification and to aggregate families and case-control samples by means of a linear regression framework. Within-family correlations were empirically estimated from the data and incorporated into the variance of the test statistic. Zhang et al. [9] proposed a similar method in which they defined a score test and used generalized estimating equations [10] to account for familial correlation. Their method can be more easily applied to multivariate outcomes. Other useful approaches, some with a focus on genome-wide association, have been proposed but are not evaluated here [11][12][13][14][15][16][17][18][19][20][21].
Because the approach by Chen and Lin [7] is not immediately generalizable to pedigrees, we extracted nuclear families and then sampled 194 trios from the nuclear families to provide a uniform comparison between the methods. These sampled data (697 unrelated case or control individuals and 582 family members from the 194 trios) are used for our comparisons. We assume an additive mode of inheritance throughout.

Chen and Lin's method
Chen and Lin's [7] approach uses the conditional on parental genotypes (CPG) approach of Schaid and Sommer [22] to construct the likelihood of the case-trio samples. An estimate for the RR is obtained from the CPG likelihood and is denotedb trio . This estimate is then compared to a traditional logistic regression estimate of the genotype log odds ratio,b CC , using the case-control sample, which is composed of case-trio probands and the unrelated control subjects. Chen and Lin use a Wald-type test to determine whether the effect estimates are consistent. If this test is not rejected, a weighted least-squares estimator for the combined genetic effect is then constructed for inference as:ˆb where W 1 and W 2 are weights derived from linear model theory assuming the parameter estimates follow a multivariate normal distribution (see Chen and Lin [7] for details). Here, the assumptions of a rare disease and no population stratification are necessary for validity. However, the test used to reject the appropriateness of combining the RR estimates is not well powered, as evidenced by our simulations, which often did not confer sufficient evidence to reject the null hypothesis of parameter equivalence even though the simulated disease is not, in fact, rare-a necessary condition for such equivalence. This method was designed for case-trio and unrelated control subjects; however, in our analyses control offspring from the control trios are added to the casecontrol subsample.

Zhu et al.'s method
In Zhu et al.'s [8] approach, principal components are calculated from the genotypes of all unrelated individuals (trio parents and unrelated case and control subjects), and both the genotypes and the phenotypes of these individuals are then separately regressed on the principal components. The resulting linear regression parameter estimates are used to calculate genotypic and phenotypic residuals,  y ij and  g ij , respectively, where i indexes families and j indexes individuals within a family. The covariance between these residuals is measured as: where N is the number of families, k i is the number of individuals in the ith family, and N T is the total number of individuals. Within-family correlations are taken into account in the calculation of the variance of T to construct a Wald test. Although this method requires enough markers to estimate principal components, it has the distinct advantage of being robust to population stratification. It can incorporate more complex family structures and does not discard any of the GAW17 data for analysis. Software to apply this approach, FamCC, is available from Zhu et al. [8].

Zhang et al.'s method
Zhang et al.'s [9] method adapts a score test statistic proposed by Lange et al. [23] that applies generalized estimating equations to family-based association tests. To obtain estimates for the score test statistic, the components of the test statistic are decomposed into two mutually exclusive sets: the unrelated individuals and the trios. Traits are treated as constants so that the population genotype mean and variance are estimated for the unrelated individuals and the genotype mean and variance for the offspring are defined through Mendelian transmissions. Similar to Zhu et al.'s method, this framework allows for incorporation of covariates, but unlike the other methods considered, it can easily handle missing parents.
Zhang et al. [9] use principal components analysis (PCA) to adjust for population stratification. This is done separately for the two data subsets. The standard principalcomponents-based adjustment is used for the unrelated individuals in order to adjust the corresponding genotype and phenotype vectors by means of linear regression on the principal components, which results in: where  m and  g are the adjusted population trait and genotype means, respectively. A TDT-like PCA that adjusts for population stratification in family data [24] is used within the set of related individuals to define: where g im and g if are the mother's and father's genotypes in the ith family, respectively. The score Z = U + R is squared and standardized by its variance to provide a score test. Zhang et al. [9] provide a Java-based program, GAP, for analysis.

Results
For each method we tested all 24,487 SNPs from the GAW17 data using the 697 unrelated individuals in the case-control sample and the subsampled 194 trios (582 individuals) in each of the 200 simulation replicates, with affected status as the phenotype. Although an adjustment for multiple testing would be appropriate for this study design, we chose to use a 5% nominal level of significance throughout in order to better compare the methods. Although these methods readily generalize to handling other genetic models, we assumed an additive mode of inheritance throughout. Table 1 displays the average rejection rates across all noncausal and causal SNPs for each aggregation method. Although error rate inflation does not appear to be a problem, it is easy to see that all methods are low powered and that only the Zhang et al. [9] approach appears to have a discernible increase in the rejection rates from the null SNPs to the causal SNPs. It also appears that removing so-called spurious genes [25] from the noncausal SNPs lowers the error rate, as expected.

SNP discovery power
Although the power averaged over causal SNPs was low, some of the SNPs were detectable at high rates. Figure 1 displays the empirical powers for each method plotted against the effect size and grouped into three categories of SNP minor allele frequency. Here, effect size is not directly for disease status but rather for an underlying distribution of disease susceptibility [2]. It is clear that many rare SNPs are not detectable for any of the examined methods. However, contrary to intuition, many of the rarer SNPs provide the highest levels of power.
Those SNPs with substantive power vary between small and large effect sizes. Examining SNPs for which there is at least modest power (Table 2) reveals that the Zhang et al. [9] approach most often is the highest powered.

Discussion and conclusions
Several methods address the problem of combining population-and family-based genetic association data. These methods differ fundamentally in whether they incorporate within-family transmissions and rely on tests for population stratification to justify effect estimate aggregation or perform between-individual analyses using family data. Performance related to population stratification cannot be assessed here because no stratification was simulated in the GAW17 data. Although the Zhang et al. [9] method performed better than the other two methods considered, we did see that no method was well powered to detect causal SNPs in this scenario. Both the Zhang et al. [9] and the Zhu et al. [8] methods allow for more general pedigree structures than the trios-only analysis performed here and will likely perform more favorably when larger pedigrees are considered. In future work, we plan to adapt aggregation methods suitable for the CDRV hypothesis. Gene, effect size, minor allele frequency (MAF), and empirical rejection rate over the 200 replications from each method for the 21 causal SNPs conferring ≥20% empirical rejection rate from at least one of the three methods. The maximum empirical rejection rate over the three methods is in boldface for each causal SNP. There are 160 causal SNPs, 2 of which confer susceptibility through two different components of the latent disease susceptibility distribution.