Global meta-analysis of transcriptomics studies

PLoS One. 2014 Feb 26;9(2):e89318. doi: 10.1371/journal.pone.0089318. eCollection 2014.

Abstract

Transcriptomics meta-analysis aims at re-using existing data to derive novel biological hypotheses, and is motivated by the public availability of a large number of independent studies. Current methods are based on breaking down studies into multiple comparisons between phenotypes (e.g. disease vs. healthy), based on the studies' experimental designs, followed by computing the overlap between the resulting differential expression signatures. While useful, in this methodology each study yields multiple independent phenotype comparisons, and connections are established not between studies, but rather between subsets of the studies corresponding to phenotype comparisons. We propose a rank-based statistical meta-analysis framework that establishes global connections between transcriptomics studies without breaking down studies into sets of phenotype comparisons. By using a rank product method, our framework extracts global features from each study, corresponding to genes that are consistently among the most expressed or differentially expressed genes in that study. Those features are then statistically modelled via a term-frequency inverse-document frequency (TF-IDF) model, which is then used for connecting studies. Our framework is fast and parameter-free; when applied to large collections of Homo sapiens and Streptococcus pneumoniae transcriptomics studies, it performs better than similarity-based approaches in retrieving related studies, using a Medical Subject Headings gold standard. Finally, we highlight via case studies how the framework can be used to derive novel biological hypotheses regarding related studies and the genes that drive those connections. Our proposed statistical framework shows that it is possible to perform a meta-analysis of transcriptomics studies with arbitrary experimental designs by deriving global expression features rather than decomposing studies into multiple phenotype comparisons.

Publication types

  • Meta-Analysis
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Streptococcus pneumoniae / genetics
  • Transcriptome*

Grants and funding

This work was supported by national funds through Fundação para a Ciêancia e a Tecnologia (FCT, Portugal) under contracts PestOE/EEI/LA0021/2011; through IDMEC, under LAETA (PestOE/EME/LA0022); and projects PneumoSyS (PTDC/SAUMII/100964/2008), InteleGen (PTDC/DTPFTO/1747/2012), and BacHBerry (FP7KBBE20137 singlestage, Grant Agreement no 613793). SV acknowledges support by Program Investigador FCT (IF/00653/2012) from FCT, co-funded by the European Social Fund (ESF) through the Operational Program Human Potential(POPH). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.