Ten simple rules on writing clean and reliable open-source scientific software

PLoS Comput Biol. 2021 Nov 11;17(11):e1009481. doi: 10.1371/journal.pcbi.1009481. eCollection 2021 Nov.

Abstract

Functional, usable, and maintainable open-source software is increasingly essential to scientific research, but there is a large variation in formal training for software development and maintainability. Here, we propose 10 "rules" centered on 2 best practice components: clean code and testing. These 2 areas are relatively straightforward and provide substantial utility relative to the learning investment. Adopting clean code practices helps to standardize and organize software code in order to enhance readability and reduce cognitive load for both the initial developer and subsequent contributors; this allows developers to concentrate on core functionality and reduce errors. Clean coding styles make software code more amenable to testing, including unit tests that work best with modular and consistent software code. Unit tests interrogate specific and isolated coding behavior to reduce coding errors and ensure intended functionality, especially as code increases in complexity; unit tests also implicitly provide example usages of code. Other forms of testing are geared to discover erroneous behavior arising from unexpected inputs or emerging from the interaction of complex codebases. Although conforming to coding styles and designing tests can add time to the software development project in the short term, these foundational tools can help to improve the correctness, quality, usability, and maintainability of open-source scientific software code. They also advance the principal point of scientific research: producing accurate results in a reproducible way. In addition to suggesting several tips for getting started with clean code and testing practices, we recommend numerous tools for the popular open-source scientific software languages Python, R, and Julia.

Publication types

  • Editorial
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / statistics & numerical data*
  • Programming Languages
  • Regression Analysis
  • Software Design*
  • Software*

Grants and funding

This research was funded in part by the Gordon and Betty Moore Foundation (https://www.moore.org/) through Grant GBMF3834, by the Alfred P. Sloan Foundation (http://sloan.org) through Grant 2013-10-27 to the University of California, Berkeley. HHZ was funded by the Innovate for Health Data Science Fellowship (https://innovateforhealth.berkeley.edu/data-science-health-innovation-fellowship), a collaboration between U.C. Berkeley UCSF, and Johnson & Johnson. CCM holds a Postdoctoral Enrichment Program Award from the Burroughs Wellcome Fund (https://www.bwfund.org/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. HHZ received a salary from the Innovate for Health Data Science Fellowship.