Reprohackathons: promoting reproducibility in bioinformatics through training

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i11-i20. doi: 10.1093/bioinformatics/btad227.

Abstract

Motivation: The reproducibility crisis has highlighted the importance of improving the way bioinformatics data analyses are implemented, executed, and shared. To address this, various tools such as content versioning systems, workflow management systems, and software environment management systems have been developed. While these tools are becoming more widely used, there is still much work to be done to increase their adoption. The most effective way to ensure reproducibility becomes a standard part of most bioinformatics data analysis projects is to integrate it into the curriculum of bioinformatics Master's programs.

Results: In this article, we present the Reprohackathon, a Master's course that we have been running for the last 3 years at Université Paris-Saclay (France), and that has been attended by a total of 123 students. The course is divided into two parts. The first part includes lessons on the challenges related to reproducibility, content versioning systems, container management, and workflow systems. In the second part, students work on a data analysis project for 3-4 months, reanalyzing data from a previously published study. The Reprohackaton has taught us many valuable lessons, such as the fact that implementing reproducible analyses is a complex and challenging task that requires significant effort. However, providing in-depth teaching of the concepts and the tools during a Master's degree program greatly improves students' understanding and abilities in this area.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology*
  • Curriculum*
  • Data Analysis
  • Humans
  • Reproducibility of Results
  • Software