Volume 3 Supplement 4
Globaltest and GOEAST: two different approaches for Gene Ontology analysis
© Hulsegge et al; licensee BioMed Central Ltd. 2009
Published: 16 July 2009
Gene set analysis is a commonly used method for analysing microarray data by considering groups of functionally related genes instead of individual genes. Here we present the use of two gene set analysis approaches: Globaltest and GOEAST.
Globaltest is a method for testing whether sets of genes are significantly associated with a variable of interest. GOEAST is a freely accessible web-based tool to test GO term enrichment within given gene sets. The two approaches were applied in the analysis of gene lists obtained from three different contrasts in a microarray experiment conducted to study the host reactions in broilers following Eimeria infection.
The Globaltest identified significantly associated gene sets in one of the three contrasts made in the microarray experiment whereas the functional analysis of the differentially expressed genes using GOEAST revealed enriched GO terms in all three contrasts.
Globaltest and GOEAST gave different results, probably due to the different algorithms and the different criteria used for evaluating the significance of GO terms.
Several methods have recently been developed for gene set analysis of microarray data [1, 2]. These methods evaluate differential gene expression patterns of groups of functionally related genes instead of individual genes. The aim is to discover gene sets whose expression patterns are associated with phenotypes of interest. Genes can be grouped together into gene sets, for example, based on function (Kyoto Encyclopedia of Genes and Genomes (KEGG), Gene Ontology (GO) ) or location (chromosome, cytoband). In this paper we present the results obtained with two different gene set analysis approaches: Globaltest  and Gene Ontology Enrichment Analysis Software Toolkit (GOEAST) . Globaltest is a method for testing whether sets of genes are significantly associated with a variable of interest. The method is based on a prediction model for predicting a response variable from the gene expression measurements of a set of genes. The null hypothesis tested is that expression profile of the genes in the gene set is not associated with the response variable. GOEAST is a freely accessible web-based tool to test GO term enrichment within given gene sets. It supports the analysis of data from common commercial microarray platforms and even customized arrays if the probe annotation file in the required format is provided.
These approaches were applied in the analysis of gene lists obtained from three different contrasts in a microarray experiment conducted to study the host reactions in broilers following Eimeria infection.
The Globaltest allows different kinds of variables to be tested, based on which it determines the correct model (logistic, linear or survival).
The Globaltest calculates the p-value using different methods, the most important ones being permutations and the asymptotic distribution. Here the asymptotic distribution was used. All p-values were corrected for multiple testing using Benjamini and Hochberg's False Discovery Rate (FDR) . GO terms were considered significant if the p-value after correcting for multiple testing, was below 0.05. The influence of individual genes in a GO term was evaluated using z-score calculated in Globaltest. Genes with z-scores that are greater than 2 were considered significant contributors to the GO term. GO terms which matched only one gene were excluded from the analysis.
The Globaltest package also offers plots to visualize the effects of different genes and different samples on the test result: 1. Sample plot: how good a sample fits to its phenotype, 2. Checkerboard: correlation between samples, and 3. Gene plot: Influence of individual genes to test statistics.
R version 2.8.0 was used to run the Globaltest package (version 4.12.0).
For GOEAST all GO terms with less than 5 probes associated with it on the array are discarded from the test because the statistical analysis would not be appropriate then.
The Fisher's exact test available in GOEAST was used separately on the 2-fold upregulated and downregulated gene lists for each of the three contrasts. The p-values were adjusted using Benjamini-Yekutieli method  with cutoff for FDR control set at 0.1. The Benjamini-Yekutieli method is more suitable for positively related multiple tests as is the case for enriched GO terms within gene lists . To reduce the FDRs caused by over-representation of neighbouring GO terms due to their hierarchical dependency, Adrian Alexa's improved weighted scoring algorithm  which is implemented in GOEAST was used.
The results from GOEAST analysis are presented in 3 ways: an HTML table providing detailed information of enriched GO terms and their associated genes; a plain-text file of enriched GO terms; and separate graphical output files showing the hierarchical relationships of enriched GO terms in the 3 GO categories.
Besides the Fisher's exact test, GOEAST also supports hypergeometric test and χ2-test as well as other methods for multiple testing correction (Hochberg, Bonferroni, Hommel).
The Globaltest takes into account the entire raw expression data. The overall gene expression profile for the three contrasts (MM8-PM8, MM8-MA8 and MM8-MM24) was significantly associated (p < 0.05) with their outcomes, the p-values using the asymptotic method being 0.006, 0.032 and 0.021 respectively. This shows that the overall gene expression pattern of MM8 chicken differs significantly from that of PM8, MA8 and MM24 chicken. Therefore there is a potential in predicting infection from gene expression data.
Top 5 GO terms in contrast MM8-PM8 identified by Globaltest
GO term ID
GO term description
Number of genes in GO terma
Number of genes affectedb
FDR adjusted p-valuec
actin filament bundle formation
organelle organization and biogenesis
lipid catabolic process
purine base biosynthetic proces
glycine transmembrane transporter activity
troponin I binding
cyclin-dependent protein kinase...
Summary of results of Fisher's exact test in GOEAST for the 3 contrasts
Number of significantly expressed genes (>2 fold up/down regulation)
Number of genes with GO annotation
Number of enriched GO terms (adjusted p-value<0.1)
In this study, two different approaches for gene set analysis were used to analyse three contrasts made in a microarray experiment. The Globaltest is a method for testing whether sets of genes are significantly associated with a variable of interest. GOEAST, a web based software, tests for enriched GO terms in specified gene sets.
The Globaltest is a direct gene set testing method and does not start from a list of differential expressed genes, but from the raw expression data. An advantage of Globaltest compared to GOEAST is its ability to identify GO terms with genes that have limited changes in gene expression. With Globaltest, enriched GO terms can be found because only a few genes are highly differentially expressed or because many genes are only slightly differentially expressed. This may help to distinguish the key player genes of the affected GO term. The identification of genes contributing more or less to particular biological processes and molecular functions may be of great help in guiding further investigation of the pathways.
For Globaltest, given the small sample size (10 microarrays) a permutation distribution could not generate a unique p-value and therefore the asymptotic distribution was used. Although the asymptotic distribution is correct for large sample sizes, it also gives a good indication for small sample sizes .
From GOEAST results, it was noted that several enriched GO terms were associated with only 1 or a few genes in the tested gene lists. Though the terms still appear to be statistically significant, their biological relevance should be carefully looked into.
For example, 3 among the top 5 GO 'biological process' terms enriched in the list of down regulated genes of the contrast MM8-MA8 had one and the same gene, TICAM1, annotated to that term. However, these terms may still be biologically relevant since the TICAM1 gene is known to be involved in innate immunity against invading pathogens and therefore important in the context of the experiment that generated the gene lists.
We found different results for the two methods probably due to the different algorithms used and also the different criteria used for evaluating the significance of GO terms. Different results achieved by different gene set analysis methods were previously reported by other authors [2, 9].
The Globaltest and GOEAST gave different results, probably due to the different algorithms and also the different criteria used for evaluating the significance of GO terms. This confirms that different gene set analysis methods perform differently and that they do not necessarily lead to the same biological conclusions. A pitfall in interpretation of the results presented here is the lack of sufficient annotation of the probes used in this microarray experiment.
List of abbreviations used
False Discovery Rate
Gene Ontology Enrichment Analysis Software Toolkit
Kyoto Encyclopedia of Genes and Genomes
The authors wish to acknowledge Dr. J.M.J. Rebel et al. from Animal Sciences Group Wageningen UR, Lelystad, The Netherlands for providing the Chicken infection data set and EADGENE for financial support (EU Contract No. FOOD-CT-2004-506416).
This article has been published as part of BMC Proceedings Volume 3 Supplement 4, 2009: EADGENE and SABRE Post-analyses Workshop. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/3?issue=S4.
- Khatri P, Drăghici S: Ontological analysis of gene expression data: Current tools, limitations, and open problems. Bioinformatics. 2005, 21 (18): 3587-3595. 10.1093/bioinformatics/bti565.PubMed CentralView ArticlePubMedGoogle Scholar
- Song S, Black M: Microarray-based gene set analysis: a comparison of current methods. BMC Bioinformatics. 2008, 9 (1): 502-10.1186/1471-2105-9-502.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: Tool for the unification of biology. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Goeman JJ, van de Geer Van de SA, De Kort F, van Houwellingen HC: A global test for groups of genes: Testing association with a clinical outcome. Bioinformatics. 2004, 20 (1): 93-99. 10.1093/bioinformatics/btg382.View ArticlePubMedGoogle Scholar
- Zheng Q, Wang XJ: GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. Nucleic Acids Res. 2008, 36 (Web Server issue): W358-W363. 10.1093/nar/gkn276.PubMed CentralView ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statisical Society Series B (Methodological). 1995, 57 (1): 289-300.Google Scholar
- Benjamini Y, Yekutieli D: The control of the false discovery rate in multiple testing under dependency. Ann Stat. 2001, 29 (4): 1165-1188. 10.1214/aos/1013699998.View ArticleGoogle Scholar
- Alexa A, Rahnenfuhrer J, Lengauer T: Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006, 22 (13): 1600-1607. 10.1093/bioinformatics/btl140.View ArticlePubMedGoogle Scholar
- Dinu I, Liu Q, Potter JD, Adewale AJ, Jhangri GS, Mueller T, Einecke G, Famulsky K, Halloran P, Yasui Y: A Biological Evaluation of Six Gene Set Analysis Methods for Identification of Differentially Expressed Pathways in Microarray Data. Cancer Inform. 2008, 6: 357-368.PubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.