Dynamic pathway analysis of genes associated with blood pressure using whole genome sequence data
© Hu and Paterson; licensee BioMed Central Ltd. 2014
Published: 17 June 2014
Groups of genes assigned to a pathway, also called a module, have similar functions. Finding such modules, and the topology of the changes of the modules over time, is a fundamental problem in understanding the mechanisms of complex diseases. Here we investigated an approach that categorized variants into rare or common and used a hierarchical model to jointly estimate the group effects of the variants in a pathway for identifying enriched pathways over time using whole genome sequencing data and blood pressure data. Our results suggest that the method can identify potentially biologically meaningful genes in modules associated with blood pressure over time.
It has long been recognized that genetic analysis of longitudinal phenotypic data is important for understanding the genetic architecture and biological variations of complex diseases. The analysis can help identify the stage of disease development at which specific genetic variants play a role. However, the statistical methods to analyze longitudinal genetic data are limited. A commonly used approach is to analyze the longitudinal genetic traits by averaging multiple response measurements obtained at different time points from the same individual. This approach may miss a lot of useful information related to the variability of repeated genetic traits, although it is simple and computationally less expensive. Linear mixed models have also been used for repeated measures data .
Recently, there has been a shift to testing rare variants, mostly using next-generation sequence technologies, for association with complex diseases. We explored dynamic pathway-based analysis of genes associated with blood pressure over time using whole genome sequencing data. We first performed gene-based association analysis at each of the 3 time points by stratifying the variants into rare and common. Then we performed pathway enrichment analysis separately at each time point. Finally, we built pathway crosstalk network maps using the enriched pathways to identify potential subnetworks associated with blood pressure over time.
For genotype data, we analyzed sequencing data of the 142 unrelated individuals on chromosome 3, which includes 1,215,120 variants. For phenotype data, we analyzed the simulated phenotypes of replicate 1. We analyzed 2 quantitative traits: systolic blood pressure (SBP) and Q1. SBP was measured at 3 time points (T1, T2, and T3), and was close to normally distributed (data not shown) after treatment effect adjustment (see below). There are 31 functional loci (genes) on chromosome 3 that influence the simulated SBP. Q1 was simulated as a normally distributed phenotype but not influenced by any of the genotyped single-nucleotide polymorphisms. It also has no correlation with SBP measured at T1, T2, and T3. The Pearson correlation of SBP at the 3 time points with Q1 based on the 142 unrelated individuals is −0.09 (p value = 0.27), −0.02 (p value = 0.78), and −0.006 (p value = 0.94), respectively. Q1 was generated primarily to facilitate assessment of type 1 error.
Adjust treatment effects
It has been shown that the association between measured blood pressure and underlying genotype is potentially confounded by antihypertensive treatment . Following Cui et al , we adjusted SBP of subjects receiving antihypertensive medications by adding a constant value of 10 mm Hg at the 3 study exams (n = 22, 51, and 73; 15.5%, 35.9%, and 51.4%, respectively). Such a strategy had higher power than the alternatives .
Analyze common and rare genetic variants using hierarchical models
where is the predictor of main-effect for individual i at genetic variant j in group equaling to the number of minor alleles for an additive coding and where is the total number of variants. represents the group effect for variants in group . β is a vector of all the coefficients and the intercept β.
The data distribution is expressed as where θ is a dispersion parameter, and the distribution takes normal distribution. Because there are many highly correlated variants in a given gene in next-generation sequencing studies, a hierarchical framework is constructed for priors of the distributions of coefficients (g and β) in the model. The method was implemented in R package BhGLM (http://www.ssg.uab.edu/bhglm/).
We assigned the genetic variants to a gene if they were in the gene or within 10 kilobases (kb) of either side of the gene. We performed 2 analyses to evaluate the association between genotype and SBP at each study exam separately. First, we divided the variants within a gene into rare (k = 1) and common variants (k = 2). Separately we analyzed all the genetic variants in a gene, irrespective of allele frequency. Our main objective was to estimate gene effects and to test the hypothesis k = 1 (rare variants) and k = 2 (common variants) for the first analysis and k = 1 (rare and common variants) for the second analysis. We corrected for multiple testing using the Benjamini and Hochberg method .
Dynamic pathway analysis
We mapped the approximately 1200 genes on chromosome 3 to the c2 curated pathways (version 3) from the Broad Institute (http://www.broadinstitute.org/gsea/msigdb/), which includes 2934 gene sets collected from 186 Kyoto Encyclopedia of Genes and Genomes (http://www.genome.jp/kegg/), 430 Reactome, 217 BioCarta pathways, 880 canonical pathways, 825 biological process, and 396 molecular function gene ontology terms. We kept only the pathways with at least 5 genes in our data set, which left 531 pathways for analysis.
There are different ways to test for genes associated with an excess of SBP in the same pathway. We used the "gene set enrichment test" implemented in the limma R package . The approach uses the Wilcoxon signed rank test to compute a p value to test the hypothesis that a given gene set tends to be more highly ranked than would be expected by chance. The ranking is based on a t-like test statistic, and here we used the z statistics from the hierarchical model described in above Section (Analyze common and rare genetic variants using hierarchical models). The test is essentially a streamlined version of the gene set enrichment analysis approach introduced by Subramanian et al .
We performed dynamic pathway crosstalk analysis between each pair of time points using the enriched pathways with a nominal p value of <0.05. Two pathways were considered to crosstalk if they shared at least 1 functional locus (gene). This ensures that each of pathway and its crosstalk has biological meaningfulness. We built pathway crosstalk subnetworks using Cytoscape (http://www.cytoscape.org/).
Significant association of causal genes with rare variants with SBP at T1 and T2 based on chromosome 3 gene-based tests
Time period and gene
False-positive rate (FPR) and false-negative rate (FNR) of gene-based analyses
Time period or trait
Gene-based stratified by allele frequency
Rare and common variants
To further evaluate the whole-spectrum of power of the approaches to identify causal genes, we drew the receiver operating characteristic curves for each type of strategy at each time point and estimated its area under the curve. We found that each analysis strategy has different power to identify causal genes at different time points. Overall, the analysis based on rare variants had the largest power at T3, which also had larger power to detect disease-causing genes than common variants at T2.
Enriched pathways found in T1, T2, and T3
No. of genes
No. of genes on chr. 3
No. of functional loci
pValue of T1
pValue of T2
pValue of T3
Actin filament-based process
Actin cytoskeleton organization and biogenesis
Cytoskeleton organization and biogenesis
In this study, we evaluated the associations between rare and common genetic variants and the simulated quantitative trait (SBP) measured at 3 time points at the gene and pathway levels. We found that joint modeling all the variants (rare and common) together had a high type I error, which may be a result of linkage disequilibrium between common and rare variants, or the average effect between rare and common variants. However, a strategy that categorized variants into rare or common and used a hierarchical model to jointly estimate the group effects showed rare variants had higher power to detect functional loci than did common variants. Although we did not find statistically significant pathways associated with SBP (FDR of the 0.05 level), we showed some enriched pathways shared across time at a nominal p value cutoff of 0.05. Interestingly, we also found a subnetwork with 3 enriched pathways that showed crosstalk between each pair of time points, suggesting the dynamic pathway crosstalk may have a key role in the pathogenesis of SBP. It should be noted the "'functional" loci defined in simulation answers provided by Genetic Analysis Workshop 18 (GAW18) organizers were polymorphic based on all individuals, but they may be not polymorphic in the unrelated individuals analyzed in this study. In this case, some functional loci (or genes) may not have effects in the unrelated data, which may lead to the bias in calculation of false negatives.
In summary, we proposed a framework to identify dynamic pathways which have the potential in regulating SBP via analyzing repeated traits with next-generating sequencing. This can generate insights into the progressive mechanisms of the underlying disease. This analysis strategy can also be applied to examine the mechanisms that drive the progression of complex diseases.
This work was supported by a grant from Genome Canada through the Ontario Genomics Institute. ADP holds a Canada Research Chair in Genetics of Complex Diseases.
The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.
This article has been published as part of BMC Proceedings Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.
- Paterson AD, Waggott D, Boright AP, Hosseini SM, Shen E, Sylvestre MP, Wong I, Bharaj B, Cleary PA, Lachin JM, et al: A genome-wide association study identifies a novel major locus for glycemic control in type 1 diabetes, as measured by both A1C and glucose. Diabetes. 2010, 59: 539-549. 10.2337/db09-0653.PubMed CentralView ArticlePubMedGoogle Scholar
- Tobin MD, Sheehan NA, Scurrah KJ, Burton PR: Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure. Stat Med. 2005, 24: 2911-2935. 10.1002/sim.2165.View ArticlePubMedGoogle Scholar
- Cui JS, Hopper JL, Harrap SB: Antihypertensive treatments obscure familial contributions to blood pressure variation. Hypertension. 2003, 41: 207-210. 10.1161/01.HYP.0000044938.94050.E3.View ArticlePubMedGoogle Scholar
- Yi N, Liu N, Zhi D, Li J: Hierarchical generalized linear models for multiple groups of rare and common variants: jointly estimating group and individual-variant effects. PLoS Genet. 2011, 7: e1002382-10.1371/journal.pgen.1002382.PubMed CentralView ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 1995, 85: 289-300.Google Scholar
- Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3:: Article 3Google Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005, 102: 15545-15550. 10.1073/pnas.0506580102.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.