OutPredict: multiple datasets can improve prediction of expression and inference of causality

Sci Rep. 2020 Apr 22;10(1):6804. doi: 10.1038/s41598-020-63347-3.

Abstract

The ability to accurately predict the causal relationships from transcription factors to genes would greatly enhance our understanding of transcriptional dynamics. This could lead to applications in which one or more transcription factors could be manipulated to effect a change in genes leading to the enhancement of some desired trait. Here we present a method called OutPredict that constructs a model for each gene based on time series (and other) data and that predicts gene's expression in a previously unseen subsequent time point. The model also infers causal relationships based on the most important transcription factors for each gene model, some of which have been validated from previous physical experiments. The method benefits from known network edges and steady-state data to enhance predictive accuracy. Our results across B. subtilis, Arabidopsis, E.coli, Drosophila and the DREAM4 simulated in silico dataset show improved predictive accuracy ranging from 40% to 60% over other state-of-the-art methods. We find that gene expression models can benefit from the addition of steady-state data to predict expression values of time series. Finally, we validate, based on limited available data, that the influential edges we infer correspond to known relationships significantly more than expected by chance or by state-of-the-art methods.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Computer Simulation
  • Gene Expression Profiling / methods*
  • Gene Expression Profiling / statistics & numerical data
  • Gene Regulatory Networks*
  • Machine Learning
  • Models, Genetic*
  • Reproducibility of Results
  • Transcription Factors / genetics*

Substances

  • Transcription Factors