Simple and Efficient Data Analysis Dissemination for Individual Laboratories

J Proteome Res. 2020 Oct 2;19(10):4191-4195. doi: 10.1021/acs.jproteome.0c00454. Epub 2020 Aug 31.

Abstract

Scientific progress comes as we build upon the work of others. Implicit in this advance is that we have access to and can thoroughly examine the work of others. It is important to recognize that our scholarly work as scientists encompasses not only experimental design and data collection but also our analytical methods. Thus when communicating biology experiments, especially those that utilize molecular omics data, the analysis methods that connect raw data to scientific conclusions must be presented with sufficient clarity that others can reproduce our exact work. Although there are many resources for sharing raw data files, there is currently not a widely utilized method for sharing analysis methods. We present a semistructured pattern for sharing analysis methods that is simple and efficient and can be implemented by individual laboratories using existing software. This pattern requires three types of files in a publicly accessible repository, such as GitHub: (1) data files, (2) a universal I/O script that parses all data files, and (3) analysis scripts creating figures and metrics reported in the manuscript. We suggest additional conventions to improve the readability and provide a template repository for the pattern. Sharing our exact analysis methods as software, in addition to their narrative description in a manuscript, will ensure reproducibility and transparency. Importantly, the pattern we present does not require new infrastructure and can be achieved without advanced computing skills.

Keywords: bioinformatics; dissemination; open science; reproducibility.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Data Analysis*
  • Information Dissemination
  • Information Storage and Retrieval
  • Laboratories*
  • Reproducibility of Results
  • Software