In our study, we propose a modified RBS. We compare its performance and that of the burden and variance component tests mentioned earlier. The details of our modified RBS follow.
The modified replication-based weighted sum statistic: Stau
We follow the notation from [4]. Let denote the number of variants in group (k',k) where k' denotes copies of the minor allele that appear in the cases and k denotes copies of the minor alleles that appear in the controls, and k' > k. Let be defined as a similar count, except k' < k. We define S+ and S- as in [4], which are the two statistics for variants with k' > k and k' < k, respectively. Then we can define Sτ as the weighted sum of weighted sums: , where . Defining τ in this way combines the concepts touched on in the Smax and Scomb. Smax models the scenario of extremes, either all the rare variants are risk or all are protective, while the Scomb assumes equal risk and protective rare variants. Stau allows the data to weight the impact of risk and protective variants according to the data, modeling unequal protective and risk variants in the combined statistic.
Data preparation and phenotype definition
Because the methods detailed are designed for case-control data, we focus on the unrelated individuals. Using all of the longitudinal data, we define as a case any individual who became hypertensive over the course of observation, and we define as a control any individual who did not exhibit signs of hypertension. Across the 200 replicates, we have an average of 48 controls and 55 cases for a total of 103 unrelated individuals. (Of the total 159 unrelated individuals, 103 had all the information needed for this analysis.) Because we are comparing the performance of methods on the same data, we focused on genes on chromosome 3 that were used in the simulation model. We used NCBI dbSNP to identify all the single-nucleotide polymorphisms (SNPs) belonging to each gene on chromosome 3 involved in the simulation model.
We evaluated type 1 error, based on a resampling approach, specifically, we simulated a dichotomous phenotype to be independent of the genotype following the method in [8]. We simulate a Bernoulli random variable with event probability 0.5. If the variable = 1, we change the original phenotype to the alternate group, and if the variable =0, we keep the original phenotype status.
Data analysis
For the burden tests, we computed a burden statistic for each gene. We computed Smax, Scomb, Stau, C-alpha, and VT for chromosome 3 for each replicate using the simulated phenotype. We also used the freely available R-package for computing SKAT using the default values with the small sample size option [2]. For each replicate, we recorded whether each method declared one of the solution genes as significant and reported the power for each method as the proportion of the 200 replicates where each method identified the gene as associated with the disease. We focused our analysis on rarer SNPs with MAFs less than 5%.