A general method for combining different family-based rare-variant tests of association to improve power and robustness of a wide range of genetic architectures

Current rare-variant, gene-based tests of association often suffer from a lack of statistical power to detect genotype–phenotype associations as a result of a lack of prior knowledge of genetic disease models combined with limited observations of extremely rare causal variants in population-based samples. The use of pedigree data, in which rare variants are often more highly concentrated than in population-based data, has been proposed as 1 possible method for enhancing power. Methods for combining multiple gene-based tests of association into a single summary p value are a robust approach to different genetic architectures when little a priori knowledge is available about the underlying genetic disease model. To date, however, little consideration has been given to combining gene-based tests of association for the analysis of pedigree data. We propose a flexible framework for combining any number of family-based rare-variant tests of association into a single summary statistic and for assessing the significance of that statistic. We show that this approach maintains type I error and improves the robustness, to different genetic architectures, of the statistical power of family- and gene-based rare-variant tests through application to simulated phenotype data from Genetic Analysis Workshop 19.


Background
Over the past decade, the rapid decrease in costs for DNA sequence data have made it possible to consider the association of rare single nucleotide variants (SNVs) and complex disease phenotypes. However, the power of any single variant test of genetic association on such rare variants is limited. As such, many gene-based tests of association (simultaneously testing all variants within a gene) have recently been proposed, with the intent to improve statistical power over single marker association tests. Recent insights have classified most of these gene-based tests of association into 2 broad classes: burden (alternatively: collapsing, length, linear) tests and variance component (alternatively: joint, quadratic) tests [1]. Burden tests collapse rare-variant signal into a single measure of total rare-variant "burden," which is then tested for association with the phenotype. Variance component tests determine the strength of association of a particular variant site with observed trait values, and then aggregate these associations across all variants in the gene.
Despite improvements over single-marker approaches, gene-based tests may still have limited utility in detecting causal rare variants because of an overall lack of power. This lack of power is exacerbated by the fact that optimizing the limited power of gene-based tests of association (by selecting the appropriate burden or variance component test) requires some a priori knowledge of the underlying genetic architecture. In particular, burden tests are most powerful when the variants within a SNV set are mostly causal and have the same causal direction, whereas variance component tests perform optimally in circumstances where the causal direction varies (ie, in the presence of both risk-reducing and riskincreasing SNVs) [1], even for family data [2].
In response, new tests of association have been proposed that simultaneously leverage the strengths of both burden and variance component tests [3][4][5]. These methods generally propose computing a burden test pvalue and a variance components test p-value on the same SNV set, then using some method to combine the individual p values from the gene-based tests of association into one summary p value for the gene, but are frequently limited by not being applicable to family-based data. One exception is the seqMeta package in R [6].
The use of family/pedigree data has become popular as another means of increasing statistical power to detect causal rare variants. Rare SNVs are more concentrated in affected families, meaning that causal SNVs may aggregate in pedigree data more than in the general population or in a traditional case-control study. In this article, we propose a general method of combining different rare-variant tests for use on family-based data. We then quantify to what extent combining family-based rare-variant tests of association increases power and maintains type I error rate for simulated phenotypes using data from Genetic Analysis Workshop 19 (GAW19).

General combination strategy
Our approach, which extends the approach by Derkach et al [5] to the case of family-based data, combines p values across a variety of family-based gene-level rarevariant tests of association into a single summary statistic. Given a set of k distinct family-and gene-based rarevariant tests of association, the method involves the following 5 steps: (a) Generate a vector of test statistics for the j th permutation. In this article we use , although other p value combination options are possible. (e) Compute the significance level of S 0 ð Þ by finding the percentage of S j ð Þ that are greater than S 0 ð Þ , out of m. Permuting phenotype values when pedigree structure exists in the samples fails to maintain appropriate type I error rates. To address this limitation, we modify the approach listed in the previous paragraph by first fitting a linear mixed effects model to the desired covariates, with familial correlation as a random effect, and the phenotype of interest as the response, as has been done in related settings [7]. We estimated fixed effects parameters for covariates, and the random effects of familial correlation assuming a covariance matrix proportional to the prespecified kinship matrix. Because the mixed effects model generates predicted values that account for familial status, the residuals are independent of kinship, and therefore independent between subjects. These residuals can then be permuted among the individuals, and the test statistics reevaluated with these "new" phenotypes. Thus, in the previous paragraph, for family data, each Q i ð Þ is a function of the residuals from random effects models.

Application
As a proof-of-concept, we applied the above method to data from GAW19. The data consists of real genotype data, on which blood pressure phenotypes were simulated, for 849 Mexican American individuals across 20 separate pedigrees. We considered the continuous response variable mean arterial pressure (MAP), calculated as (2/3)*(diastolic blood pressure) + (1/3)*(systolic blood pressure). We assessed the relationship between MAP and the 30 genes on chromosome 3 that possess at least 1 casual variant using 1000 permutations per test. Power was then calculated as the proportion of times among the 200 individual phenotype simulations where each gene was found to be significant, using a 0.05 threshold. Type I error was assessed in a similar method on the same 30 genes, but with the trait Q1 (which was simulated to be heritable although unrelated to genotype) as the phenotype of interest.

Tests used
We examined 8 different family-based rare variant tests of association, and compared their individual performance to the performance of the 8 tests combined. The 8 tests included 4 burden tests and 4 variance component tests, each of which varied in 2 respects: the choice of variant weighting system, and the power to which the score statistic for the i th variant, U i , was raised. In this the maximum likelihood estimators for the fixed effect parameters of the covariates and the inverse of the covariance matrix, respectively, and g i the i th genotype vector. In particular, 2 different variant weighting systems were considered. The weights suggested by Wu et al. [8] for use in the sequence kernel association test (SKAT), notated as w i for the i th variant, are calculated using a Beta distribution and substantially downweight common variants, (w i e Beta MAF; a 1 ; a 2 ð Þ ; where a 1 and a 2 are prespecified parametersÞ, while the weights suggested by Price et al. [9], notated as w Ã i , have a less-severe penalty by using 1 ffiffiffiffiffiffiffiffiffiffiffiffi ffi is the allele frequency in the controls. Following the example of Liu et al. [1], we also raised the score statistics to differing powers, varying the tests' statistical power in response to a small proportion of causal variants. In general, raising score statistics to higher powers makes the test more robust to the inclusion of noncausal variants. The following 8 tests were considered: We note that Q 2 is asymptotically equivalent to famS-KAT [2], the family-based version of SKAT [8], and Q 1 is related to standard burden tests.

Type I error
Across all 30 genes, for all tests, the type I error rate was generally controlled. The average type I error rate across the 30 genes fell within a range of 0.048 to 0.09 for the 9 different methods, which was within expected limits (99% confidence interval [CI]: 0.4%, 9.6%) ( Table 1).

Power
For nearly all genes analyzed, the empirical power was low, and typically well under 0.35 (Table 2). This was particularly true for those tests with Q 1 or Q 1 Ã , which had substantially lower power; in the case of Q 1 this is likely because the w i weights are designed to be squared, whereas in the case of Q 1 Ã it is likely because common variants were not excluded, and the w Ã i weights do not sufficiently downweight common variants relative to rare variants. However, despite the presence of these 2 underpowered tests in the combined p value procedure, the combined test had power in the same range as more optimal tests. We further note that, depending on the characteristics of the gene being examined, tests featuring either the w Ã i weights or the w i weights were often Table 1 Type I error rate for 10 genes selected to be representative of all 30 genes explored dramatically more powerful than the other. This can be attributed to the fact that w i Ã less dramatically downweights common variants relative to rare variants than does w i . This phenomenon is particularly evident in 2 genes: PDCD6IP, which contains a number of highly causal rare variants, and ZBTB38, which contains several causal common variants (Fig. 1). In each case, however, the combined test strikes a middle ground: not quite as strong as the better-chosen weights, but superior to the poorer weights (Fig. 1).

Correlation between tests
Pairwise correlation between p values of different tests was generally high with an average pairwise correlation of 0.78 (SD = 0.16; min = 0.42; max = 0.98; detailed results not shown).

Discussion
Few current methods allow for a flexible combination of family-based rare-variant tests of association. In response, we propose an adaptable framework for combining any number and type of gene-based rare-variant tests on pedigree data that condition on genotype data. Unlike current existing combination methods, our proposed method does not limit the researcher to combining just 1 family-based burden test with 1 family-based variance-component test, but instead allows for the combination of an arsenal of varied and specialized tests. Our proposed method also offers a distinct advantage over simply running all possible individual tests and choosing the most significant result; our combination approach appropriately controls for multiple testing and for the correlation between tests. When applied to the Table 2 Power for 10 genes selected to be representative of all 30 genes explored  Fig. 1 Empirical power of all 9 family-based rare-variant tests of association for 2 genes of interest, PDCD6IP and ZBTB38. When performing family-based rare-variant tests of association on gene PDCD61P, w i -weighted tests (Q2, Q1, Q3, Q4) saw higher empirical power than w i Ã -weighted tests (Q2*, Q1*, Q4*, Q3*); for gene ZBTB38, this trend was reversed. The combined test (black) maintained consistent power between the 2 genes, and was less susceptible to the change in genetic architecture GAW19 data, the combination method appeared to sufficiently maintain type I error and demonstrated generally robust statistical power, even when applied to a wide variety of genetic architectures, or when low-powered tests were included.
However, our current approach does have several limitations. In the above proof-of-concept application, the combined test incorporates 8 highly correlated individual tests of association; the additional noise from this correlation diminishes the power of the combined test to detect any meaningful genotype-phenotype associations. Further research is needed to explore what combination of tests minimizes such correlation while also maximizing statistical power. Further work is also necessary to evaluate the benefits and drawbacks of different choices of combined test statistics (eg, alternatives to using the minimum p value).The permutation component of our combination approach also poses computational challenges. The process of calculating p values when asymptotic distributions have not been derived is time intensive. The tests evaluated here were applied to only 30 genes in a single simulated data set. We note that if combining tests with known asymptotic distributions, it is straightforward to utilize asymptotically obtained p values from individual tests when creating p j ð Þ . Further testing and evaluation on other data sets with different characteristics is necessary. A single familybased rare-variant test of association, if properly chosen and adapted to match the genetic architecture, will always be more powerful than a combined approach. That being said, researchers may not definitively know what the underlying genetic architecture is, and there is always a tradeoff between power and robustness. Given this, combined tests on family-based rare-variant data show promise for ensuring the robustness of statistical power to different genetic architectures with minimal power loss. Finally, the permutation approach proposed here relies on the assumption that the regression model is true under the genetic null hypothesis, which will rarely be true in practice. However, both Chen et al. [2] and our results suggest that the type I error rate is controlled in practice. Further research is necessary to explore whether samples and conditions exists where the type I error rate will increase.

Conclusions
Our analysis acts as a proof-of-concept for combined tests on family-based rare-variant data. In particular, our analysis suggests that the type I error rate is controlled by the proposed method, while power may be more robust to differing genetic architectures than individual, gene-based rare-variant tests. Further work is needed to explore the trade-off between combining many tests versus combining a smaller set of diverse tests in other data sets/genes to ensure transferability of these findings. Additionally, we need to explore alternatives to permutation strategies to compute statistical significance resulting from the inherent computational limitations of this approach.