Genomics and data science: an application within an umbrella

Genome Biol. 2019 May 29;20(1):109. doi: 10.1186/s13059-019-1724-1.


Data science allows the extraction of practical insights from large-scale data. Here, we contextualize it as an umbrella term, encompassing several disparate subdomains. We focus on how genomics fits as a specific application subdomain, in terms of well-known 3 V data and 4 M process frameworks (volume-velocity-variety and measurement-mining-modeling-manipulation, respectively). We further analyze the technical and cultural "exports" and "imports" between genomics and other data-science subdomains (e.g., astronomy). Finally, we discuss how data value, privacy, and ownership are pressing issues for data science applications, in general, and are especially relevant to genomics, due to the persistent nature of DNA.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Review

MeSH terms

  • Data Science*
  • Genomics*