Accelerated knowledge discovery from omics data by optimal experimental design

Nat Commun. 2020 Oct 6;11(1):5026. doi: 10.1038/s41467-020-18785-y.

Abstract

How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. We present an optimal experimental design method (coined OPEX) to identify informative omics experiments using machine learning models for both experimental space exploration and model training. OPEX-guided exploration of Escherichia coli's populations exposed to biocide and antibiotic combinations lead to more accurate predictive models of gene expression with 44% less data. Analysis of the proposed experiments shows that broad exploration of the experimental space followed by fine-tuning emerges as the optimal strategy. Additionally, analysis of the experimental data reveals 29 cases of cross-stress protection and 4 cases of cross-stress vulnerability. Further validation reveals the central role of chaperones, stress response proteins and transport pumps in cross-stress exposure. This work demonstrates how active learning can be used to guide omics data collection for training predictive models, making evidence-driven decisions and accelerating knowledge discovery in life sciences.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Anti-Bacterial Agents / pharmacology
  • Bacterial Proteins / genetics
  • Computational Biology / methods*
  • Disinfectants / pharmacology
  • Escherichia coli / drug effects*
  • Escherichia coli / genetics*
  • Gene Expression Regulation, Bacterial / drug effects
  • Machine Learning
  • Membrane Proteins / genetics
  • Models, Biological*
  • Molecular Chaperones / genetics
  • Research Design
  • Stress, Physiological / drug effects
  • Stress, Physiological / genetics

Substances

  • Anti-Bacterial Agents
  • Bacterial Proteins
  • Disinfectants
  • Membrane Proteins
  • Molecular Chaperones