The EADGENE and SABRE post-analyses workshop
© Jaffrezic et al; licensee BioMed Central Ltd. 2009
Published: 16 July 2009
Analysis of genome-wide gene expression using DNA microarrays has become pervasive in almost all areas of biology. The area of biology addressed by this workshop is gene expression studies in livestock looking at transcriptomic differences between treatments as well as genotypes and combinations of these. Two years ago, we organized a workshop to discuss the best approaches to analyze two-colour DNA microarray data in our area of research and the outcomes of that workshop have been published in 4 open access publications [1–4]. While there is currently a reasonable amount of consensus on the statistical analyses of a microarray experiment (i.e. getting a gene list), the subsequently analysis of the gene list is still an area of much confusion to many scientists.
During a three-day workshop in November 2008, we discussed five aspects of these so-called post analyses of microarray data: 1) re-annotation of the probe set on DNA microarrays, 2) pathway analyses to identify significantly affected biological processes from microarray results, 3) reverse engineering of regulatory networks from microarray results, 4) the integration of gene expression studies with QTL detection studies and 5) the prediction of phenotypic outcomes using gene expression results.
Prior to the workshop, we distributed two sets of data to the workshop participants. The first set of gene expression data deals with experimental challenge of chicken with two types of Eimeria. This experiment is described in some detail in one of the summary papers , while the actual data is available from ArrayExpress http://www.ebi.ac.uk/microarray-as/ae/ under accession number E-MEXP-1972. The second experiment deals with the transcriptomic effects of adrenocorticotropic hormone (ACTH) treatment in two breeds of pigs. These gene expression results are available from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo, GSE8377 – DH06 Adrenal ACTH Sus scrofa).
Re-annotation of microarray probe set
Up-to-date annotation and target specificity is essential for functional analysis of microarray data. Three annotation pipelines were used to re-annotate 791 selected probes from the chicken microarray [6–8] and subsequently compared . The main difference between annotation pipelines came from differences between the thresholds that were applied in order to link a probe to a certain type of annotation. It was recommended to have flexible thresholds in order to evaluate the effect of stringency and strike the right balance between reliability and coverage of the annotation.
The application of pathway analyses
Several conceptually different analytical approaches, using both commercial and public available software, were applied by the participating groups to interpret the affected probes from the chicken experiment [10–15]. A total of twelve pathway related software tools were tested on the chicken data. The main focus of the approaches was to utilise the relation between probes/genes and their gene ontology and pathways to interpret the affected probes/genes. The lack of a well annotated chicken genome did limit the possibilities to fully explore the tools. The main results from these analyses showed that the biological interpretation is highly dependent on the statistical method used but that some common biological conclusions could be reached .
Reverse engineering of regulatory networks
Graphical Gaussian models, as implemented in the R library GeneNet, were applied to 85 gene transcripts from the chicken experiment that were selected for their significance and lack of missing data. While a large number of significant relationships (edges) were found between these 85 genes, they could not be confirmed using pathway analyses because of limited annotation .
Integration of microarrays with QTL results
Using the pig experiment, three groups evaluated different ways to link the gene expression results to QTL results: 1) co-location between differentially expressed genes and QTL results from the same experiments [17, 18], 2) co-location between differentially expressed genes and QTL from the public domain, and 3) overlap between genes and QTL regions at the Pathway level: genes and QTL may not co-locate but differentially expressed genes hare enriched pathways with genes in the QTL region . Because the pig has only a preliminary draft genome sequence, comparative mapping approaches were also used to compare QTL locations and differentially expressed genes. Because of very limited annotations, no meaningful pathway comparisons could be made.
Phenotypic prediction from microarray data
The pig data has two treatments and two genotypes. In order to predict these grouping using the microarray data the authors used a Random Forest approach and also compared the classical Partial Least Squares regression (PLS) with a novel approach called sparse PLS . All methods performed well on this data set. The sparse PLS outperformed the PLS in terms of prediction performance and improved the interpretability of the results. Both approaches are well adapted to transcriptomic data where the number of features is much greater than the number of individuals. Only a small number of genes (<20) was required to give perfect prediction of the four groups.
Take home message
The central theme of the meeting was the lack of annotation. This was not in terms of bioinformatics tools to link sequences between species but a clear lack of knowledge regarding gene function. This was not specific for livestock species and considerable efforts are required before pathway based approaches will really come to fruition. In this context, there is a clear benefit for methods that do not require any level of annotation such as reverse engineering of networks and phenotypic prediction from microarray data. One challenging opportunity is to catalogue this level of experimental annotation (e.g. 'up-regulated after infection with Eimeria') as an alternative means to derive functional links over time.
The authors gratefully acknowledge the local workshop organisers in Lelystad as well as crucial coordination by Caroline Channing. The authors acknowledge the EC-funded Integrated Project SABRE (EC contract number FOOD-CT-2006-01625) and the EC-funded Network of Excellence EADGENE (EC contract number FOOD-CT-2004-506416) for supporting the workshop and publication of this manuscript.
This article has been published as part of BMC Proceedings Volume 3 Supplement 4, 2009: EADGENE and SABRE Post-analyses Workshop. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/3?issue=S4.
- de Koning DJ, Jaffrezic F, Lund MS, Watson M, Channing C, Hulsegge I, Pool MH, Buitenhuis B, Hedegaard J, Hornshoj H, et al: The EADGENE microarray data analysis workshop (open access publication). Genet Sel Evol. 2007, 39: 621-631. 10.1051/gse:2007028.PubMed CentralView ArticlePubMedGoogle Scholar
- Jaffrezic F, de Koning DJ, Boettcher PJ, Bonnet A, Buitenhuis B, Closset R, Dejean S, Delmas C, Detilleux JC, Dovc P, et al: Analysis of the real EADGENE data set: Comparison of methods and guidelines for data normalisation and selection of differentially expressed genes (Open Access publication). Genet Sel Evol. 2007, 39: 633-650. 10.1051/gse:2007029.PubMed CentralView ArticlePubMedGoogle Scholar
- Sorensen P, Bonnet A, Buitenhuis B, Closset R, Déjean S, Delmas C, Duval M, Glass L, Hedegaard J, Hornshoj H, et al: Analysis of the real EADGENE data set: Multivariate approaches and post analysis (Open Access publication). Genet Sel Evol. 2007, 39: 651-668. 10.1051/gse:2007030.PubMed CentralView ArticlePubMedGoogle Scholar
- Watson M, Alegre MP, Baron MD, Delmas C, Dovc P, Duval M, Foulley JL, Pavon JJG, Hulsegge I, Jaffrezic F, et al: Analysis of a simulated microarray dataset: Comparison of methods for data normalisation and detection of differential expression (Open Access publication). Genet Sel Evol. 2007, 39: 669-683. 10.1051/gse:2007031.PubMed CentralView ArticlePubMedGoogle Scholar
- Hedegaard J, Arce C, Bicciato S, Bonnet A, Ramerez-Boo M, Buitenhuis AJ, Collado-Romero M, Conley LN, SanCristobal M, Ferrari F, et al: Methods for interpreting lists of affected genes obtained in a DNA microarray experiment. BMC Proceedings. 2009, 3 (Suppl 4): S5-PubMed CentralView ArticlePubMedGoogle Scholar
- Casel P, Moreews F, Lagarrigue S, Klopp C: sigReannot: an oligo-set re-annotation pipeline based on similarities with the Ensembl transcripts and Unigene clusters. BMC Proceedings. 2009, 3 (Suppl 4): S3-10.1186/1753-6561-3-s4-s3.PubMed CentralView ArticlePubMedGoogle Scholar
- Neerincx PBT, Rauwerda H, Nie H, Groenen MAM, Breit TM, Leunissen JAM: OligoRAP – An Oligo Re-Annotation Pipeline to improve annotation and estimate target specificity. BMC Proceedings. 2009, 3 (Suppl 4): S4-10.1186/1753-6561-3-s4-s4.PubMed CentralView ArticlePubMedGoogle Scholar
- Prickett D, Watson M: IMAD: Flexible annotation of microarray sequences. BMC Proceedings. 2009, 3 (Suppl 4): S2-10.1186/1753-6561-3-s4-s2.PubMed CentralView ArticlePubMedGoogle Scholar
- Neerincx PBT, Casel P, Prickett D, Nie H, Watson M, Leunissen JAM, Groenen MAM, Klopp C: Comparison of three Microarray Probe Annotation Pipelines: Differences in Strategies and their Effect on Downstream Analysis. BMC Proceedings. 2009, 3 (Suppl 4): S1-10.1186/1753-6561-3-s4-s1.PubMed CentralView ArticlePubMedGoogle Scholar
- Bonnet A, Lagarrigue S, Liaubet L, Robert-Granié C, SanCristobal M, Tosser-Klopp G: Pathway results from the chicken data set using GOTM, Pathway Studio and Ingenuity software. BMC Proceedings. 2009, 3 (Suppl 4): S11-10.1186/1753-6561-3-s4-s11.PubMed CentralView ArticlePubMedGoogle Scholar
- Jimenez-Marin A, Collado-Romero M, Ramerez-Boo M, Arce-Jimenez C, Garrido JJ: Biological pathway analysis by ArrayUnlock and Ingenuity Pathway Analysis. BMC Proceedings. 2009, 3 (Suppl 4): S6-10.1186/1753-6561-3-s4-s6.PubMed CentralView ArticlePubMedGoogle Scholar
- Prickett D, Watson M: Use of GenMAPP and MAPPFinder to analyse pathways involved in chickens infected with the protozoan parasite Eimeria. BMC Proceedings. 2009, 3 (Suppl 4): S7-10.1186/1753-6561-3-s4-s7.PubMed CentralView ArticlePubMedGoogle Scholar
- Nie H, Neerincx PBT, Poel van der JJ, Ferrari F, Bicciato S, Leunissen JAM, Groenen MAM: Microarray data mining using Bioconductor packages. BMC Proceedings. 2009, 3 (Suppl 4): S9-10.1186/1753-6561-3-s4-s9.PubMed CentralView ArticlePubMedGoogle Scholar
- Hulsegge IB, Kommadath A, Smits MA: Globaltest and GOEAST: Two different approaches for Gene Ontology analysis. BMC Proceedings. 2009, 3 (Suppl 4): S10-10.1186/1753-6561-3-s4-s10.PubMed CentralView ArticlePubMedGoogle Scholar
- Skarman A, Jiang L, Hornshoj H, Buitenhuis AJ, Hedegaard J, Conley LN, Sorensen P: Gene set analysis methods applied to chicken microarray expression data. BMC Proceedings. 2009, 3 (Suppl 4): S8-10.1186/1753-6561-3-s4-s8.PubMed CentralView ArticlePubMedGoogle Scholar
- Jaffrezic F, Tosser-Klopp G: Gene network reconstruction from microarray data. BMC Proceedings. 2009, 3 (Suppl 4): S12-10.1186/1753-6561-3-s4-s12.PubMed CentralView ArticlePubMedGoogle Scholar
- Désautes C, Bidanel JP, Milant D, Iannuccelli N, Amigues Y, Bourgeois F, Caritez JC, Renard C, Chevalet C, Mormède P: Genetic linkage mapping of quantitative trait loci for behavioral and neuroendocrine stress response traits in pigs. J Anim Sci. 2002, 80: 2276-2285.PubMedGoogle Scholar
- Hazard D, Liaubet L, Sancristobal M, Mormède P: Gene array and real time PCR analysis of the adrenal sensitivity to adrenocorticotropic hormone in pig. BMC Genomics. 2008, 9: 101-10.1186/1471-2164-9-101.PubMed CentralView ArticlePubMedGoogle Scholar
- Jouffe V, Rowe SJ, Liaubet L, Buitenhuis AJ, Hornshoj H, SanCristobal M, Mormède P, de Koning DJ: Using microarrays to identify positional candidate genes for QTL: the case study of ACTH response in pigs. BMC Proceedings. 2009, 3 (Suppl 4): S14-10.1186/1753-6561-3-s4-s14.PubMed CentralView ArticlePubMedGoogle Scholar
- Robert-Granié C, Le Cao K-A, SanCristobal M: Predicting qualitative phenotypes from microarray data – the Eadgene pig data set. BMC Proceedings. 2009, 3 (Suppl 4): S13-PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.