From sequence to information

Philos Trans R Soc Lond B Biol Sci. 2020 Dec 21;375(1814):20190448. doi: 10.1098/rstb.2019.0448. Epub 2020 Nov 2.


Today massive amounts of sequenced metagenomic and metatranscriptomic data from different ecological niches and environmental locations are available. Scientific progress depends critically on methods that allow extracting useful information from the various types of sequence data. Here, we will first discuss types of information contained in the various flavours of biological sequence data, and how this information can be interpreted to increase our scientific knowledge and understanding. We argue that a mechanistic understanding of biological systems analysed from different perspectives is required to consistently interpret experimental observations, and that this understanding is greatly facilitated by the generation and analysis of dynamic mathematical models. We conclude that, in order to construct mathematical models and to test mechanistic hypotheses, time-series data are of critical importance. We review diverse techniques to analyse time-series data and discuss various approaches by which time-series of biological sequence data have been successfully used to derive and test mechanistic hypotheses. Analysing the bottlenecks of current strategies in the extraction of knowledge and understanding from data, we conclude that combined experimental and theoretical efforts should be implemented as early as possible during the planning phase of individual experiments and scientific research projects. This article is part of the theme issue 'Integrative research perspectives on marine conservation'.

Keywords: data; entropy; genome; information; modelling; sequence; time-series.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Conservation of Natural Resources / methods*
  • Ecosystem*
  • Gene Expression Profiling*
  • Metagenome*
  • Metagenomics*
  • Models, Biological
  • Transcriptome*