There are 259 participants with full genotype, expression, and blood pressure data available in the GAW19 family data set. Our analysis focuses on systolic blood pressure (SBP) and diastolic blood pressure (DBP) as outcomes of interest. The average SBP and DBP from all visits were used as the summary measurement respectively for every participant. To adjust for pedigree structure in this data, we estimated identity-by-descent (IBD) matrix (Σ) using genome-wide single nucleotide polymorphism (SNP) marker data obtained using the Illumina platform provided through GAW19. To incorporate the dependence information embedded in the family structure, we transformed the average blood pressure measurements: \( \mathrm{Y}\sim \mathrm{M}\mathrm{V}\mathrm{N}\left(\upmu, \Sigma \right),\kern0.5em {\mathrm{Y}}^{*}={\Sigma}^{-\frac{1}{2}}Y\sim \mathrm{M}\mathrm{V}\mathrm{N}\left(\upmu, \mathrm{I}\right) \), so that individuals from the same family are independent in the transformed phenotypic value (Y*). In addition, to account for the population stratifications that exist in the Mexican American population, we performed multidimensional scaling and calculated the first 3 principal components. In the following analysis, the residual from the transformed phenotypic value adjusting for the first 3 principal components were used. We considered genetic variant with a minor allele frequency of less than 0.01 as rare variant, and performed rare variants analysis for the genes reported by hg19 build on the odd-number autosomes that have less than 50 rare variants and more than 1 rare variant.
Sequential sum test
Consider k rare variants in a gene and that SNP
i
indicates the number of rare variant alleles in variant i in a general regression equation: \( g\left(E(Y)\right)={\beta}_0+{\displaystyle {\sum}_{i=1}^k}{\gamma}_iSN{P}_i{\beta}_c \), with γ
i
= ν
i
s
i
; where s
i
is 1, −1, or 0, indicating whether the effect of rare variant i is positive or negative or excluded from the equation, and v
i
is a weight assigned to rare variant i. In our analysis, we assumed vi = 1. In addition, β
c
represents the common odds ratio between the trait and the rare variants in the gene. We performed the Seq-aSum-VS approach described in Basu and Pan [3] and obtained p value for each gene with 500 permutations.
Constructing weights using expression measurements
After obtaining the p value for each gene from the Seq-aSum-VS test, we used gene expression information to construct weight for each gene. In Roeder et al. [9, 10], the authors suggested to use a weight (w
i
> 0) to adjust p value (p
i
) and to reject the null hypothesis if it belongs to the set of all gene i for which p
i
/w
i
≤ α. The weight adjustment procedure maintains the proper FWER control as long as w
i
> 0 and \( {\overline{w}}_i=1 \).
Building on the theoretical findings, we developed a novel weight-adjustment approach for rare variant association analysis. After weight adjustment, genes that have strong contributions to phenotype-associated gene expression will be assigned weights greater than 1, hence achieving smaller weight-adjusted p values. The weighting mechanism is as follows: we assign a weight w
i
to each gene and the weight is the product of 2 parts: \( {w}_{g_i{E}_j} \) and \( {w}_{E_jP} \). The first term\( ,\ {w}_{g_i{E}_j}, \)indicates the effect of gene g
i
on the j
th gene expression measurement, E
j
. The second term, \( {w}_{E_jP}, \) describes whether gene expression measurement (E
j
) is associated with the phenotypic outcome (P). Eq. (1) is applied to obtain \( {w}_{g_i{E}_j} \):
$$ {E}_j={\beta}_{0_{ij}}+{\beta}_{g_i{E}_j}\times {g}_i+{\gamma}_{ij}\times P+{\epsilon}_{ij} $$
(1)
and \( {w}_{g_i{E}_j}={\left(\frac{\widehat{\beta_{g_i{E}_j}}}{\mathrm{SE}\left(\widehat{\beta_{g_i{E}_j}}\right)}\right)}^2 \). In equation 1, g
i
is the number of total rare variants in gene i calculated by collapsing the genotypes across rare variant loci. A second equation (2) was implemented to obtain \( {w}_{E_jP}={\left(\frac{\widehat{\beta_{E_{jP}}}}{\mathrm{SE}\left(\widehat{\beta_{E_jP}}\right)}\right)}^2 \):
$$ P={\beta}_{0_{ij}}+{\beta}_{E_jP}\times {E}_j+{\gamma}_{ij}\times {g}_i+{\epsilon}_{ij} $$
(2)
The benefit of taking the product of 2 weights is that if either \( {w}_{g_i{E}_j} \) or \( {w}_{E_jP} \) is zero, then the resulting product will be zero. On the other hand, if both \( {w}_{g_i{E}_j} \) and \( {w}_{E_jP} \) are substantially large, then taking the product of the two parts will result in an amplified overall weight. In other words, if rare variants in the gene under consideration provide a strong contribution to outcome P through E
j
, then \( {w}_{g_i{E}_j}\times {w}_{E_jP} \) will be a large value.
A crude weight for gene i is set to be the maximum of the products among all gene expression measurements: \( {w}_{M{P}_i}={\mathrm{Max}}_j\left({w}_{g_i{E}_j}\times {w}_{E_jP}\right) \). To ensure that \( {\overline{w}}^{*}=1 \), we divide crude weights (\( {w}_{M{P}_i}\Big) \) by their average (\( \overline{{\mathrm{w}}_{\mathrm{MP}}} \)) as required by Roeder and Wasserman [10]: \( {w}_{M{P}_i}^{*}=\frac{w_{M{P}_i}}{\overline{w_{M{P}_i}}} \). If \( {w}_{M{P}_i} \) is larger than the average, then \( {w}_{M{P}_i}^{*} \) will be greater than 1 after dividing by the average. We calculate adjusted p value for the ith gene as: adjusted p value for gene \( i=\frac{p\ \mathrm{value}\ \mathrm{of}\ \mathrm{gene}\ i\ \mathrm{reported}\ \mathrm{b}\mathrm{y}\ \mathrm{S}\mathrm{e}\mathrm{q}\hbox{-} \mathrm{aSum}\hbox{-} \mathrm{V}\mathrm{S}\ \mathrm{test}\ }{w_{M{P}_i}^{*}} \). If after adjustment, p value becomes greater than 1, then it is set to 1.
For the genes with adjusted p value of less than 0.05, we performed gene set enrichment analysis using biological process categories defined in gene ontology (GO). To account for the hierarchical structure in GO terms, we implemented conditional hypergeometric test [11].