Identification of low frequency and rare variants for hypertension using sparse-data methods

Availability of genomic sequence data provides opportunities to study the role of low-frequency and rare variants in the etiology of complex disease. In this study, we conduct association analyses of hypertension status in the cohort of 1943 unrelated Mexican Americans provided by Genetic Analysis Workshop 19, focusing on exonic variants in MAP4 on chromosome 3. Our primary interest is to compare the performance of standard and sparse-data approaches for single-variant tests and variant-collapsing tests for sets of rare and low-frequency variants. We analyze both the real and the simulated phenotypes.


Background
Despite the success of genome-wide association studies, much of the genetic contribution to complex diseases and traits remains unexplained. Therefore, an increasing number of studies have turned to low-frequency and rare variant association analysis for additional explanation of disease risk or trait variability. For binary phenotypes, single-variant analyses of low-frequency and rare variants are challenging because the conventional logistic regression approaches often violate the largesample-size assumption for test statistics, resulting in poor type 1 error control or low statistical power [1,2]. The standard score test, in particular, can be extremely anticonservative under the null [3]. Variant-collapsing methods across multiple variants or sparse-data methods for single-variant analysis offer an alternative [1][2][3][4][5]. Furthermore, depending on the linkage disequilibrium (LD) structure, it is possible that even nonfunctional lowfrequency or common variants can capture functional rare variant signals [4]. On the other hand, because power is higher for a variant with a higher minor allele frequency (MAF), a common functional variant will usually be better detected by a single-variant test rather than as part of a collapsing test that incorporates nonfunctional variants.
In this report, we analyze the exome-sequence data and both the real and simulated phenotype data of the unrelated Mexican American sample to evaluate and compare the performance of single-variant and variantcollapsing methods for association analysis.

Methods
To relate genotypes to hypertension, we consider the logistic regression model where i = 1, …, 1943 indexes the individuals, HTN i indicates hypertension status of the i th individual (1 if the individual is hypertensive and 0, otherwise); AGE i is the age at the time of examination, SEX i is the gender of the individual, and G i ¼ G i 1 ; G i 2 ; …; G i m ð Þ indicates the vector containing the numbers of copies of the nonreference alleles at m variants (ie, additively coded genotype), and β g 0 ¼ β 1 ; β 2 ; …; β m À Á is the vector of the associated parameters.
For a single-variant analysis with m = 1, we apply 2 types of nonstandard approaches: Firth-type penalized logistic regression likelihood ratio (LR) tests [2,[6][7][8], and small-sample adjusted score tests [9], and compare them to standard LR and score tests. The LR and score tests are asymptotically equivalent but may be discrepant in finite samples. The penalized LR test is based on the penalized log-likelihood function where i(β) is the Fisher information matrix. This is a generalization of Haldane's statistic for sparse 2 × 2 table analysis, where 1 2 is added to each cell. For the smallsample-adjusted single-variant score tests, we apply an approach to adjust the null distribution of the test statistic by incorporating small-sample variance and/or kurtosis (see Lee et al. [9], pp. 226-227); this approach was originally recommended for variant-collapsing tests.
For variant-collapsing analysis, we consider a MAF-based weighted burden test [1], a nonburden sequence kernel association test (SKAT) and a unified approach (SKAT-O) that optimally combines a burden test and a SKAT (eg, Lee et al. [9]). For these tests, we first define K subregions, then pool the variants within each subregion, and test K null Applying these methods, we analyzed exonic variants within MAP4 gene on chromosome 3 in the real and the simulated phenotype data sets. For the imputed variants, we analyzed the predicted dosages rather than their best-guess genotypes. In addition, we examined all polymorphic variants, including the singletons to assess the extremes at which the tests break down. For the standard and penalized logistic regression tests, we used the R glm function and pmlr (Penalized Multinomial Logistic Regression) package [10], respectively. For the small-sample-adjusted score test and the variant-collapsing tests, we used the R package SKAT [11], with analytical variance estimates and empirical kurtosis estimates based on 10,000 bootstrap replicates. For the variant-collapsing methods, we let K = 6 based on a visual inspection of the physical positions of the variants (Fig. 1a).

Data preparation
In the real data set, we defined the hypertension phenotype using the conventional diagnostic criteria: a systolic blood pressure (SBP) greater than 140 mm Hg or a diastolic blood pressure (DBP) greater than 90 mm Hg. We also defined individuals on antihypertensive medication to be hypertensive regardless of their SBP and DBP levels.
For the simulated phenotypes, 2 data sets were available, "SIMQ1" and "SIMPHEN," each with 200 replicates. SIMQ1, designed for evaluating type 1 error rates, contained normally distributed Q 1 generated under no genetic effects. Because SIMQ1 did not have binary phenotypes, we dichotomized Q 1 to create hypothetical a b Fig. 1 Pairwise LD measures for markers within MAP4 region on chromosome 3 in 1943 unrelated samples. The hg19 genome assembly was used for annotation. In panel (a), each pixel represents pairwise LD, measured by the squared allelic correlation coefficient r 2 between 2 markers. In panel (b), LD is measured by Lewontin's |D ' |. The latter are generally higher because |D ' | takes into account that the correlation is constrained by the allele frequencies. As indicated by the color key, stronger LD is represented by red and weaker by white. The LD plot was produced using the LDheatmap package [16] disease status Q 2 , letting Q 2 correlate with AGE and SEX through Q 1 . We let Q 2 = 1 if Q 1 was greater than 51.2 and 0 otherwise, such that the disease prevalence for Q 2 was 17.8 %, the same as the prevalence of hypertension in SIMPHEN, which we used for evaluating power. The hypertension phenotype was derived from blood pressure phenotypes generated under a model with more than 1000 variants in more than 200 genes [12].

MAP4 variants in the unrelated sample
Of the 409 exonic MAP4 variants, only 90 were polymorphic in the sample of 1943 unrelated individuals. These variants had MAFs ranging from 0.00027 to 0.34. As expected, rare variants (MAF <1 %) were most prevalent in the sample; except for 4 common variants, all variants had MAF less than 5 % (Fig. 2, Table 1). As expected for rare variants (eg, Pritchard [13]), the pairwise LD in the 90 variants was generally weak, with the exception of a few variants in strong LD in an upstream region (see Fig. 1). However, the strong LD seems to arise because of their physical proximities (all the markers in the LD block are located within 39 bases).

Analysis of the real phenotype data
We found that the standard score test rejects the null hypothesis far more often than the other single-variant tests (results not shown), suggesting that it may be anticonservative. This agrees with published simulations under a case-control design [3] and is confirmed by our own unpublished simulation studies under a cohort design at the observed hypertension prevalence of   Table 1 and Fig. 1a, did not find the MAP4 gene to be significant either (minimum unadjusted p values = 0.12, 0.24, and 0.20, respectively).

Analysis of the simulated phenotype data
It has been demonstrated that for a genome-wide study with a large sample size, minor allele count (MAC) is the key parameter determining test calibration [3]. Because we analyzed predicted dosages, we do not have a MAC for all the variants. Hence, for the presentation of simulation results, we use the count of individuals with G > 0 dosage, denoted by g MAC , which is close to the MAC for a low MAF. For type 1 error rates of the single-variant tests, we pooled the results across all variants with the same values of g MAC. Power for the singlevariant tests was evaluated separately for each of the 26 functional variants. For the variant-collapsing tests, power was examined for each subregion containing at least one functional variant.

Test size and type I error
Examination of quantile-quantile (Q-Q) plots of the singlevariant test p values for rare variants revealed departures from the expected distribution under the null hypothesis of no genetic effects with some discrepancies among tests. For example, for var_3_47660325, with g MAC ¼ 1; all the single-variant tests showed unusual departures from the expected (Fig. 3a). For low-frequency variants, the p value distributions were close to the expected, except in the upper tail where all tests seemed to be anticonservative (eg, Fig. 3b). As expected, the common variant test p values were close to the null distribution with no discrepancy among the tests (eg, Fig. 3c).
Examination of empirical type 1 error rates for the single-variant tests demonstrates that no method performed uniformly better than others for the rare variants with very low MACs (Fig. 4). For example, when g MAC is less than 15, the standard score test tended to be anticonservative at a significance testing level of 0.01 (Fig. 4), but was conservative at the less stringent significance level of 0.05 (results not shown). The standard LR test tended to be conservative for low g MAC (eg, <10), but could be anticonservative when this count was between 10 and 20. Although the 2 small-sample score tests could also be anticonservative, and the penalized LR test tended to be conservative in general, the type 1 error rates of these tests were closer to the nominal level than the standard tests. When g MAC is 66 or greater (or MAF >1 %), all the single-variant tests seem to control type 1 error reasonably well.

Power
All the single-variant tests had power of less than 20 % to detect each of the rare variants, but had 100 % power for the low-frequency and the common variants at the significance levels of 0.01 and 0.05. For the low- frequency variants, tests had discrepant p values, and differential power at a stricter significance level. For example, in Fig. 5b, all the tests for var_3_47957996 (MAF = 0.0024) had p values of less than 0.01; however, the 2 LR tests had consistently lower p values than the 3 score tests. At a significance level of 1e-06, the standard and penalized LR tests had 91 and 82 % power, respectively, whereas the standard, the small-sample-variance, and the small-sample-variance-kurtosis score tests had less than 10 % power.
Among the 4 subregions with at least 1 functional variant, power was nonnegligible only in subregions 4 and 6 ( Table 2). Figure 5d-f shows the results from the variant-collapsing tests of the markers in subregion 4, which contains all 3 types of functional variants (rare, low-frequency, and common). As expected, the burden test tended to have lower power than SKAT or SKAT-O because the subregion includes both protective and deleterious variants. These tests all had low power when the subregion includes only rare variants (eg, Fig. 5d). The power improved when the subregion included both the rare and the low-frequency functional variants (eg, Fig. 5e). When, however, the common variants were additionally included, the power did not seem to improve further (Fig. 5f ). When compared with the single-variant tests of markers in the same subregion ( Fig. 5b and c), the results suggest that this subregion would have been detected by some of the single-variant tests, as well, even at the genome-wide significance level of 5e-08.

Discussion and conclusions
In this article, we evaluated standard and sparse-data methods for single-variant and variant-collapsing tests to examine the association between a hypertension phenotype and exonic variants in MAP4 gene on chromosome 3, using both the real and the simulated phenotypes in unrelated Mexican Americans.
In the analysis of the real phenotype data, none of the single-variant and the variant-collapsing methods detected MAP4 variants significantly associated with hypertension. A limitation of our analysis is that we did not make any adjustment for ancestry admixture/ population structure. In genetic association studies of admixed populations such as Mexican Americans, addressing differential ancestral backgrounds is important to avoid false positive or negative association signals [14,15].
In our simulation investigation, we found that the sparse-data approaches improve type 1 error control, but their power remains low for detecting the rare variant effects. Because power of the association tests depends on both frequency and effect size of rare variants, even with large effects, the tests may detect rare variants only in studies with large samples. We may be more successful in identifying rare variants when we use joint or meta-analyses combining data or summary statistics from different studies (eg, Ma et al. [3]). For the lowfrequency variants, all the single-variant tests seem to have improved type 1 error rates and power. It seems that the LR tests have higher power than the score tests  Table 1 and Fig. 2a). Panels (a) to (c) show the p values from the single-variant tests for a rare, low-frequency, and common functional variant, respectively; the tests are the standard likelihood ratio test (LRT), penalized likelihood ratio test (PLRT), standard score test (Score) and small-sample-adjusted score tests (Score-Var-Adj and Score-Var-Kurt-Adj), which are indicated by yellow squares, black circles, red point-down triangles, purple diamonds, and green point-up triangles, respectively. Panels (d) to (f) show the results from the variant-collapsing tests when they include the rare variants, rare and low-frequency variants, and all the variants within the region. The p values from the weighted burden tests, SKAT, and SKAT-O are, respectively, represented by pink circles, blue cross marks, and green diamonds. The calculations are based on 200 simulated data sets in SIMPHEN at a stringent significance level. However, we cannot make any concrete conclusions because of the limited number of replications provided in the simulation design. Although more thorough investigation is necessary, overall, the penalized LR test and the score test with small-sample variance and kurtosis seem to be better choices than the standard tests for the analyses of rare and low-frequency variants. Moreover, caution is indicated when different tests of the same hypothesis give inconsistent p values as it suggests large-sample approximations for test statistics may be invalid. Although previous simulation studies have shown that collapsing tests can have greater power than singlevariant tests (see, eg, Madsen and Browning [1]), our investigation suggests that power of collapsing tests can be low when the tests include only the rare variants (see, eg, Fig. 5d). In addition to MAF and effect size, power of collapsing tests depends on the number of associated variants, the number of neutral variants, and whether the direction of effects is consistent within gene, so that selection of good binning and weighting strategies may boost power for detecting regions containing only rare variants.