Methods for interpreting lists of affected genes obtained in a DNA microarray experiment
- Jakob Hedegaard1Email author,
- Cristina Arce†2,
- Silvio Bicciato†3,
- Agnès Bonnet†4,
- Bart Buitenhuis†1,
- Melania Collado-Romero†2,
- Lene N Conley†1,
- Magali SanCristobal†4,
- Francesco Ferrari†5,
- Juan J Garrido†2,
- Martien AM Groenen†6,
- Henrik Hornshøj†1,
- Ina Hulsegge†7,
- Li Jiang†1,
- Ángeles Jiménez-Marín†2,
- Arun Kommadath†7,
- Sandrine Lagarrigue†8,
- Jack AM Leunissen†9,
- Laurence Liaubet†4,
- Pieter BT Neerincx†9,
- Haisheng Nie†6,
- Jan van der Poel†6,
- Dennis Prickett†10,
- María Ramirez-Boo†2,
- Johanna MJ Rebel†11,
- Christèle Robert-Granié†12,
- Axel Skarman†1,
- Mari A Smits†7,
- Peter Sørensen†1,
- Gwenola Tosser-Klopp†4 and
- Michael Watson†10
© Hedegaard et al; licensee BioMed Central Ltd. 2009
Published: 16 July 2009
The aim of this paper was to describe and compare the methods used and the results obtained by the participants in a joint EADGENE (European Animal Disease Genomic Network of Excellence) and SABRE (Cutting Edge Genomics for Sustainable Animal Breeding) workshop focusing on post analysis of microarray data. The participating groups were provided with identical lists of microarray probes, including test statistics for three different contrasts, and the normalised log-ratios for each array, to be used as the starting point for interpreting the affected probes. The data originated from a microarray experiment conducted to study the host reactions in broilers occurring shortly after a secondary challenge with either a homologous or heterologous species of Eimeria.
Several conceptually different analytical approaches, using both commercial and public available software, were applied by the participating groups. The following tools were used: Ingenuity Pathway Analysis, MAPPFinder, LIMMA, GOstats, GOEAST, GOTM, Globaltest, TopGO, ArrayUnlock, Pathway Studio, GIST and AnnotationDbi. The main focus of the approaches was to utilise the relation between probes/genes and their gene ontology and pathways to interpret the affected probes/genes. The lack of a well-annotated chicken genome did though limit the possibilities to fully explore the tools. The main results from these analyses showed that the biological interpretation is highly dependent on the statistical method used but that some common biological conclusions could be reached.
It is highly recommended to test different analytical methods on the same data set and compare the results to obtain a reliable biological interpretation of the affected genes in a DNA microarray experiment.
The previous Microarray Data Analysis Workshop organised by EADGENE (European Animal Disease Genomic Network of Excellence, ) in November 2006, focussed on the analytical methods applied to raw microarray data to obtain lists of significantly affected genes. The results from the workshop were published in Genetics Selection Evolution [2–5]. This paper summarises the results obtained from a joint EADGENE and SABRE (Cutting Edge Genomics for Sustainable Animal Breeding, ) workshop in November 2008, focusing on the interpretation of lists of significantly affected genes, thereby extending the work from the previous workshop. The aim of the workshop was to evaluate and present existing methods and softwares, and potentially to propose new methods to deal with the post-analyses of microarray data, using real data sourced from within EADGENE and SABRE.
The initial objective of an analysis of a microarray data set is to produce a list of significantly affected probes/genes. This analysis can be relatively challenging, but the major challenge is to interpret the list of hundreds to thousand affected genes and draw some biological conclusions. To assist this process, a large number of statistical methods using quite different approaches have been proposed, which can consequently produce different results if applied to the same data set [7, 8]. Gene-set analysis is a popular method and aims to identify differentially expressed gene sets associated with e.g. a phenotype of interest. Gene sets are commonly defined based on existing biological knowledge on gene function available from public databases, such as Gene Ontology (GO) [9, 10], Kyoto Encyclopaedia of Genes and Genomes (KEGG) [11, 12] and Gene Map Annotator and Pathway Profiler (GenMAPP) [13, 14]. Current available tools for gene-set analysis have recently been reviewed by Huang et al  who define three classes of tools according to their underlying algorithms: singular enrichment analysis; gene set enrichment analysis; and modular enrichment analysis. Singular enrichment analysis (SEA) is a widely used approach, which utilises gene sets derived from Gene Ontology or pathway databases and investigates the enrichment of specific gene sets in a list of significantly affected genes, defined by applying a cut-off threshold value. SEA suffers from the use of a cut-off threshold value, the level of which has a major impact on the obtained results . To avoid this problem, a group of methods termed gene set enrichment analysis (GSEA) have been developed, which utilise the information from all probes/genes in a microarray experiment. Modular enrichment analysis (MEA) is based on SEA but integrates term-term/gene-gene relationship to reveal biological meaning not revealed by single term/gene analysis.
A common challenge faced during the interpretation of the affected probes is the lack of appropriate annotation of the probes on the microarray. An affected probe without annotation can consequently not contribute to the interpretation of the results and if a major fraction of the probes are without annotation it may have a negative influence on the following analysis, such as GO enrichment analysis. A study of methods to improve the annotation of microarray probes was also a part of this workshop and is described in the adjacent papers [17–20].
In this paper, the methods applied and the results obtained by the participating groups are summarised and some general conclusions are drawn.
The data – host reactions in broilers after a secondary challenge
The contrasts used in the workshop. The number of significantly (FDR <= 0.05) affected probes for the three different contrasts used in the workshop.
Annotation of the chicken microarray probes
A proper annotation of the individual probes on a microarray is a prerequisite for establishing a link between the probe and the associated biological knowledge such as gene ontology and pathways. The annotation files used for interpreting the three provided gene lists were the ones obtained as a part of the workshop and described in the adjacent papers [17–20]. The most recent versions of the annotation files are available at the EADGENE Oligo Set Annotation Files homepage . Version 2, released September 11th 2008, based on Ensembl version 50 was used for the workshop. Furthermore, the group from the Wageningen University built a customized annotation utilising chicken-human orthologous gene information and performed separate analyses for each annotation , the group from Aarhus University investigated methods for predicting the possible annotations for genes with unknown function from the expression data  and the participants from Institute for Animal Health based their analysis on an annotation obtained with the IMAD system (see  for additional details).
Analysis of the data
Summary of the analytical approaches and choice of software for each group
Key statistical method
University of Cordoba, Spain 
Hypergeometric; Chi Square-test
Ingenuity Pathway Analysis 
Fisher's exact test
Institute for Animal Health, Compton, UK 
Aarhus University, Denmark 
Fisher's exact test; Kolmogorov Smirnoff
support vector machine classification; kernel principal components analysis
Animal Breeding and Genomics Centre, Wageningen University, Wageningen, The Netherlands 
Animal Breeding and Genomics Centre, Animal Sciences Group, Lelystad, The Netherlands 
Fisher's exact; Hypergeometric; Chi Square-test
INRA, Toulouse and Rennes 
Pathway Studio 
Subnetwork Enrichment Analysis (SNEA), one-sided Mann-Whitney U-Test
Ingenuity Pathway Analysis 
Fisher's exact test
Results and discussion
Annotation of the chicken microarray probes
The challenge of mapping the probes/genes on the chicken microarray to biological knowledge, such as gene ontology and pathways, was encountered by all groups. In general, half of the probes could be mapped and contribute to the biologically interpretation of the data. The lack of a well-annotated microarray did consequently have a detrimental effect on the results. Improvements were however obtained by using chicken-human orthologous gene annotation in contrast to chicken gene annotation as reported by the group from Wageningen University . The chicken-human orthologous gene information resulted in a higher power to detect significant GO terms due to the higher coverage of GO terms assigned to human genes comparing to chicken genes, but as human and chicken are evolutionarily rather far apart care has to be taken when interpreting the obtained results . The group from Aarhus University investigated methods for predicting the annotations of genes with unknown function from the expression data, and found that the methods may be of potential use, but that improvements in the chicken annotation, availability of larger microarray data sets and careful validation of the predictions are needed to fully utilise these methods .
Analysis of the data
The results of the different analytical approaches applied by the participating groups showed, in general, that the biological interpretation is highly dependent on the statistical method used.
The analysis for enrichment of GO-terms based on singular enrichment analysis (SEA), applied by the different groups (Table 2), revealed differences in numbers and identity of the GO-terms found to be affected. In general, many of the enriched GO-terms were found to be represented by few (1 or 2) genes. Applying the commonly used filtering criteria, requiring a reasonable number of genes, e.g. 10, to represent each GO-term, would lead to the conclusion that very few GO-terms are affected.
The commercial software Ingenuity Pathway Analysis (IPA) was used by the groups from University of Cordoba, Spain  and from INRA, Toulouse and Rennes  to explore the affected pathways. The results obtained by the two groups are quite similar even though the analyses were performed in different ways. In contrast to the group from University of Cordoba, the INRA group compared the networks of the three gene lists to the networks obtained from the complete list of genes on the microarray to identify significant networks relative to the microarray background.
Using GenMAPP/MAPPFinder, Prickett and Watson identified several biologically relevant pathways being affected, thus demonstrating the usefulness of this tool for microarray analysis, especially with an improved annotation .
The analytical methods based on gene set enrichment analysis (GSEA) (GlobalTest applied by the groups from ASG  and DJF ) did in general result in a larger number of terms to be significant than found using the SEA based methods. This was expected as theoretical considerations indicate that this method is more powerful .
The tool topGO belongs to the modular enrichment analysis (MEA) class of methods and takes the GO structure into account when testing the gene sets. Comparing the results obtained using topGO with the results from "classical" Fisher's exact test and the Kolmogorov Smirnoff test, both of which ignore the GO structure, showed that fewer significant terms were found with topGO , which may indicate increased specificity .
The majority of the analytical tools provide options for correction for multiple testing using various methods. It is common practice to apply multiple test correction to control the family-wise false-positive rate in the result list, but there is little consensus on how to perform the correction and whether the correction improves the results . Several groups applied some methods for correction for multiple testing during their analysis of the data for this workshop, and found only a few significant terms/pathways after correction [24, 27, 28]. The essential problems are that the structure of the GO graph and pathways are in conflict with the assumption of independence and that most methods for multiple test correction do not change the ranks and therefore the relative importance of the different GO terms .
Where genes are represented by one or more oligonucleotides, it is possible to carry out enrichment tests at the level of the gene or at the oligonucleotide. These two levels could potentially produce different results. However only minor differences were found between enrichment tests at the level of the gene compared to those at the level of the oligonucleotide . It is difficult however to generalize this result to other datasets but if the number of replicate probes varies for different genes it will often be better to use gene-based tests.
Despite differences in the specific GO-terms and pathways found to be affected by the groups, some common biological conclusions could though be reached for the three contrasts. Specific details of the biological conclusion can found in the papers from the participating groups [23–28]. The interpretation of the genes affected between MM8 and PM8 shows, as expected, that a secondary immune response is induced by the homologous challenge while the heterologous challenge induces a primary immune response. The lowest number of affected genes was found when comparing the expression profiles from homologous and heterologous challenge (MM8-MA8, table 1). This indicates that an E. acervulina infection triggers a similar response as an E. maxima infection. The identity of the affected genes between different time points of a homologous challenge (MM8-MM24) indicates that the secondary immune response increases from 8 to 24 hours.
Different analytical methods were applied by the teams of the joint EADGENE and SABRE workshop focusing on the extraction of biological meaning from lists of significantly affected genes. The analyses were in general negatively affected by the lack of a well annotated microarray. However, the use of chicken-human orthologous gene annotation was found to improve the analyses. The results showed that the biological interpretation is highly dependent on the statistical method used but that some common biological conclusions could be reached. It is hence recommended to test different analytical methods on the same data set and compare the results to obtain a reliable biological interpretation of the affected genes in a DNA microarray experiment.
The authors wish to acknowledge Caroline Channing and the other organisers for organising the workshop, Dr. Rebel and colleagues from Animal Science Group in Lelystad, The Netherlands, for providing the microarray data from the chicken infection experiment, the reviewers for valuable comments and EADGENE and SABRE for financial support.
This article has been published as part of BMC Proceedings Volume 3 Supplement 4, 2009: EADGENE and SABRE Post-analyses Workshop. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/3?issue=S4.
- EADGENE: European Animal Disease Genomic Network of Excellence. [http://www.eadgene.info/]
- de Koning DJ, Jaffrezic F, Lund MS, Watson M, Channing C, Hulsegge I, Pool MH, et al: The EADGENE Microarray Data Analysis Workshop (open access publication). Genet Sel Evol. 2007, 39: 621-631.PubMed CentralView ArticlePubMedGoogle Scholar
- Jaffrezic F, de Koning DJ, Boettcher PJ, Bonnet A, Buitenhuis B, Closset R, Dejean S, et al: Analysis of the real EADGENE data set: comparison of methods and guidelines for data normalisation and selection of differentially expressed genes (open access publication). Genet Sel Evol. 2007, 39: 633-650.PubMed CentralView ArticlePubMedGoogle Scholar
- Sorensen P, Bonnet A, Buitenhuis B, Closset R, Dejean S, Delmas C, Duval M, et al: Analysis of the real EADGENE data set: multivariate approaches and post analysis (open access publication). Genet Sel Evol. 2007, 39: 651-668.PubMed CentralView ArticlePubMedGoogle Scholar
- Watson M, Perez-Alegre M, Baron MD, Delmas C, Dovc P, Duval M, Foulley JL, Garrido-Pavon JJ, Hulsegge I, Jaffrezic F, Jimenez-Marin A, Lavric M, Le Cao KA, Marot G, Mouzaki D, Pool MH, Robert-Granie C, San CM, Tosser-Klopp G, Waddington D, de Koning DJ: Analysis of a simulated microarray dataset: comparison of methods for data normalisation and detection of differential expression (open access publication). Genet Sel Evol. 2007, 39: 669-683.PubMed CentralView ArticlePubMedGoogle Scholar
- SABRE: Cutting Edge Genomics for Sustainable Animal Breeding. [http://www.sabre-eu.eu/]
- Liu Q, Dinu I, Adewale AJ, Potter JD, Yasui Y: Comparative evaluation of gene-set analysis methods. BMC Bioinformatics. 2007, 8: 431-PubMed CentralView ArticlePubMedGoogle Scholar
- Song S, Black MA: Microarray-based gene set analysis: a comparison of current methods. BMC Bioinformatics. 2008, 9: 502-PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29.PubMed CentralView ArticlePubMedGoogle Scholar
- The Gene Ontology Project. [http://www.geneontology.org/]
- Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28: 27-30.PubMed CentralView ArticlePubMedGoogle Scholar
- KEGG: Kyoto Encyclopedia of Genes and Genomes. [http://www.genome.ad.jp/kegg/]
- Dahlquist KD, Salomonis N, Vranizan K, Lawlor SC, Conklin BR: GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat Genet. 2002, 31: 19-20.View ArticlePubMedGoogle Scholar
- GenMAPP: Gene Map Annotator and Pathway Profiler. [http://www.genmapp.org/]
- Huang dW, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009, 37: 1-13.PubMed CentralView ArticleGoogle Scholar
- Pan KH, Lih CJ, Cohen SN: Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. Proc Natl Acad Sci USA. 2005, 102: 8961-8965.PubMed CentralView ArticlePubMedGoogle Scholar
- Neerincx PBT, Casel P, Prickett D, Nie H, Watson M, Leunissen JAM, Groenen MAM, Klopp C: Comparison of three Microarray Probe Annotation Pipelines: Differences in Strategies and their Effect on Downstream Analysis. BMC Proceedings. 2009, 3 (Suppl4): S1-PubMed CentralView ArticlePubMedGoogle Scholar
- Prickett D, Watson M: IMAD: Flexible annotation of microarray sequences. BMC Proceedings. 2009, 3 (Suppl 4): S2-PubMed CentralView ArticlePubMedGoogle Scholar
- Casel P, Moreews F, Lagarrigue S, Klopp C: sigReannot: an oligo-set re-annotation pipeline based on similarities with the Ensembl transcripts and Unigene clusters. BMC Proceedings. 2009, 3 (Suppl 4): S3-PubMed CentralView ArticlePubMedGoogle Scholar
- Neerincx PBT, Rauwerda H, Nie H, Groenen MAM, Breit TM, Leunissen JAM: OligoRAP – An Oligo Re-Annotation Pipeline to improve annotation and estimate target specificity. BMC Proceedings. 2009, 3 (Suppl 4): S4-PubMed CentralView ArticlePubMedGoogle Scholar
- ArrayExpress. [http://www.ebi.ac.uk/microarray-as/ae/]
- EADGENE Oligo Set Annotation Files. [http://www.eadgene.info/TheProject/Integration/BiologicalresourcesandfacilitiesWP11/EADGENEOligoSetsAnnotationFiles/tabid/324/Default.aspx]
- Nie H, Neerincx PBT, Poel JVD, Ferrari F, Bicciato S, Leunissen JAM, Groenen MA: Microarray data mining using Bioconductor packages. BMC Proceedings. 2009, 3 (Suppl 4): S9-PubMed CentralView ArticlePubMedGoogle Scholar
- Skarman A, Jiang L, Hornshøj H, Buitenhuis B, Hedegaard J, Conley LN, Sørensen P: Gene set analysis methods applied to chicken microarray expression data. BMC Proceedings. 2009, 3 (Suppl 4): S8-PubMed CentralView ArticlePubMedGoogle Scholar
- Jiménez-Marín A, Collado-Romero M, Ramirez-Boo M, Arce-Jiménez C, Garrido JJ: Biological pathway analysis by ArrayUnlock and Ingenuity Pathway Analysis. BMC Proceedings. 2009, 3 (Suppl 4): S6-PubMed CentralView ArticlePubMedGoogle Scholar
- Bonnet A, Lagarrigue S, Liaubet L, Robert-Granie C, Christobal MS, Tosser-Klopp G: Pathway results from the chicken data set using GOTM, Pathway Studio and Ingenuity softwares. BMC Proceedings. 2009, 3 (Suppl 4): S11-PubMed CentralView ArticlePubMedGoogle Scholar
- Prickett D, Watson M: Use of GenMAPP and MAPPFinder to analyse pathways involved in chickens infected with the protozoan parasite Eimeria. BMC Proceedings. 2009, 3 (Suppl 4): S7-PubMed CentralView ArticlePubMedGoogle Scholar
- Hulsegge I, Kommadath A, Smits MA: Globaltest and GOEAST: Two different approaches for Gene Ontology analysis. BMC Proceedings. 2009, 3 (Suppl 4): S10-PubMed CentralView ArticlePubMedGoogle Scholar
- Goeman JJ, Buhlmann P: Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007, 23: 980-987.View ArticlePubMedGoogle Scholar
- Alexa A, Rahnenfuhrer J, Lengauer T: Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006, 22: 1600-1607.View ArticlePubMedGoogle Scholar
- ArrayUnlock. [http://www.integromics.com/ArrayUnlock.php]
- Ingenuity Pathway Analysis. [http://www.ingenuity.com/]
- Salomonis N, Hanspers K, Zambon AC, Vranizan K, Lawlor SC, Dahlquist KD, Doniger SW, Stuart J, Conklin BR, Pico AR: GenMAPP 2: new features and resources for pathway analysis. BMC Bioinformatics. 2007, 8: 217-PubMed CentralView ArticlePubMedGoogle Scholar
- Doniger SW, Salomonis N, Dahlquist KD, Vranizan K, Lawlor SC, Conklin BR: MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol. 2003, 4: R7-PubMed CentralView ArticlePubMedGoogle Scholar
- Smyth GK: Limma: linear models for microarray data. Bioinformatics and Computational Biology Solutions using R and Bioconductor. Edited by: Gentleman RC, Carey VJ, Dudoit S, Irizarry R, Huber W. 2005, New York: Springer, 397-420.View ArticleGoogle Scholar
- Goeman JJ, Geer van de SA, de Kort F, van Houwelingen HC: A global test for groups of genes: testing association with a clinical outcome. Bioinformatics. 2004, 20: 93-99.View ArticlePubMedGoogle Scholar
- GIST. [http://www.bioinformatics.ubc.ca/gist/]
- AnnotationDbi. [http://www.bioconductor.org/packages/release/bioc/html/AnnotationDbi.html]
- Falcon S, Gentleman R: Using GOstats to test gene lists for GO term association. Bioinformatics. 2007, 23: 257-258.View ArticlePubMedGoogle Scholar
- Zheng Q, Wang XJ: GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. Nucleic Acids Res. 2008, 36: W358-W363.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang B, Schmoyer D, Kirov S, Snoddy J: GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinformatics. 2004, 5: 16-PubMed CentralView ArticlePubMedGoogle Scholar
- Pathway Studio. [http://www.ariadnegenomics.com/products/pathway-studio/]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.