A Bioinformatics Primer to Data Science, with Examples for Metabolomics

Methods Mol Biol. 2020;2104:245-263. doi: 10.1007/978-1-0716-0239-3_14.


With the increasing importance of big data in biomedicine, skills in data science are a foundation for the individual career development and for the progress of science. This chapter is a practical guide to working with high-throughput biomedical data. It covers how to understand and set up the computing environment, to start a research project with proper and effective data management, and to perform common bioinformatics tasks such as data wrangling, quality control, statistical analysis, and visualization, with examples on metabolomics data. Concepts and tools related to coding and scripting are discussed. Version control, knitr and Jupyter notebooks are important to project management, collaboration, and research reproducibility. Overall, this chapter describes a core set of skills to work in bioinformatics, and can serve as a reference text at the level of a graduate course and interfacing with data science.

Keywords: Bioinformatics; Cloud computing; Data management; Data science; Data visualization; Metabolomics; Quality control; Scripting.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Review

MeSH terms

  • Cloud Computing
  • Computational Biology / methods*
  • Computational Biology / standards
  • Data Management
  • Data Science* / methods
  • Data Science* / standards
  • Database Management Systems
  • Databases, Factual
  • Humans
  • Metabolomics* / standards
  • Metabolomics* / statistics & numerical data
  • Software*