Code Review as a Simple Trick to Enhance Reproducibility, Accelerate Learning, and Improve the Quality of Your Team's Research

Am J Epidemiol. 2021 Oct 1;190(10):2172-2177. doi: 10.1093/aje/kwab092.

Abstract

Programming for data wrangling and statistical analysis is an essential technical tool of modern epidemiology, yet many epidemiologists receive limited formal training in strategies to optimize the quality of our code. In complex projects, coding mistakes are easy to make, even for skilled practitioners. Such mistakes can lead to invalid research claims that reduce the credibility of the field. Code review is a straightforward technique used by the software industry to reduce the likelihood of coding bugs. The systematic implementation of code review in epidemiologic research projects could not only improve science but also decrease stress, accelerate learning, contribute to team building, and codify best practices. In the present article, we argue for the importance of code review and provide some recommendations for successful implementation for 1) the research laboratory, 2) the code author (the initial programmer), and 3) the code reviewer. We outline a feasible strategy for implementation of code review, though other successful implementation processes are possible to accommodate the resources and workflows of different research groups, including other practices to improve code quality. Code review isn't always glamorous, but it is critically important for science and reproducibility. Humans are fallible; that's why we need code review.

Keywords: code review; implementation; learning; reproducibility crisis; team building.

MeSH terms

  • Benchmarking / methods*
  • Data Interpretation, Statistical*
  • Epidemiologic Measurements*
  • Epidemiologic Research Design
  • Epidemiology / education
  • Epidemiology / standards*
  • Feasibility Studies
  • Humans
  • Implementation Science
  • Reproducibility of Results
  • Software Validation*
  • Workflow