Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks

Julia Koehler Leman; Sergey Lyskov; Steven M Lewis; Jared Adolf-Bryfogle; Rebecca F Alford; Kyle Barlow; Ziv Ben-Aharon; Daniel Farrell; Jason Fell; William A Hansen; Ameya Harmalkar; Jeliazko Jeliazkov; Georg Kuenze; Justyna D Krys; Ajasja Ljubetič; Amanda L Loshbaugh; Jack Maguire; Rocco Moretti; Vikram Khipple Mulligan; Morgan L Nance; Phuong T Nguyen; Shane Ó Conchúir; Shourya S Roy Burman; Rituparna Samanta; Shannon T Smith; Frank Teets; Johanna K S Tiemann; Andrew Watkins; Hope Woods; Brahm J Yachnin; Christopher D Bahl; Chris Bailey-Kellogg; David Baker; Rhiju Das; Frank DiMaio; Sagar D Khare; Tanja Kortemme; Jason W Labonte; Kresten Lindorff-Larsen; Jens Meiler; William Schief; Ora Schueler-Furman; Justin B Siegel; Amelie Stein; Vladimir Yarov-Yarovoy; Brian Kuhlman; Andrew Leaver-Fay; Dominik Gront; Jeffrey J Gray; Richard Bonneau

doi:10.1038/s41467-021-27222-7

Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks

Nat Commun. 2021 Nov 29;12(1):6947. doi: 10.1038/s41467-021-27222-7.

Authors

Julia Koehler Leman^#^{1

2}, Sergey Lyskov^#³, Steven M Lewis^#⁴, Jared Adolf-Bryfogle^{5

6}, Rebecca F Alford³, Kyle Barlow⁷, Ziv Ben-Aharon⁸, Daniel Farrell^{9

10}, Jason Fell^{11

12

13}, William A Hansen^{14

15}, Ameya Harmalkar³, Jeliazko Jeliazkov¹⁶, Georg Kuenze^{17

18

19}, Justyna D Krys²⁰, Ajasja Ljubetič^{9

10}, Amanda L Loshbaugh^{21

22}, Jack Maguire²³, Rocco Moretti^{17

18}, Vikram Khipple Mulligan²⁴, Morgan L Nance¹⁶, Phuong T Nguyen²⁵, Shane Ó Conchúir²¹, Shourya S Roy Burman³, Rituparna Samanta³, Shannon T Smith^{18

26}, Frank Teets²⁷, Johanna K S Tiemann²⁸, Andrew Watkins²⁹, Hope Woods^{18

26}, Brahm J Yachnin^{14

15}, Christopher D Bahl^{30

31

32}, Chris Bailey-Kellogg³³, David Baker^{9

10}, Rhiju Das²⁹, Frank DiMaio^{9

10}, Sagar D Khare^{14

15}, Tanja Kortemme^{21

22}, Jason W Labonte³, Kresten Lindorff-Larsen²⁸, Jens Meiler^{17

18

19}, William Schief^{5

6}, Ora Schueler-Furman⁸, Justin B Siegel^{11

12

13}, Amelie Stein²⁸, Vladimir Yarov-Yarovoy²⁵, Brian Kuhlman²⁷, Andrew Leaver-Fay²⁷, Dominik Gront²⁰, Jeffrey J Gray³⁴, Richard Bonneau^{35

36

37}

Affiliations

¹ Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, 10010, USA. julia.koehler.leman@gmail.com.
² Department of Biology, New York University, New York, NY, 10003, USA. julia.koehler.leman@gmail.com.
³ Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA.
⁴ Cyrus Biotechnology, 1201 Second Ave, Suite 900, Seattle, WA, 98101, USA.
⁵ Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, 92037, USA.
⁶ IAVI Neutralizing Antibody Center, Scripps Research, La Jolla, CA, 92037, USA.
⁷ Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, CA, 94158, USA.
⁸ Department of Microbiology and Molecular Genetics, Hebrew University, Hadassah Medical School, POB 12272, Jerusalem, 91120, Israel.
⁹ Department of Biochemistry, University of Washington, Seattle, WA, 98195, USA.
¹⁰ Institute for Protein Design, University of Washington, Seattle, WA, 98195, USA.
¹¹ Genome Center, University of California, Davis, CA, 95616, USA.
¹² Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, 95616, USA.
¹³ Department of Chemistry, University of California, Davis, CA, 95616, USA.
¹⁴ Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ, 08904, USA.
¹⁵ Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, 08904, USA.
¹⁶ Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD, 21218, USA.
¹⁷ Department of Chemistry, Vanderbilt University, Nashville, TN, 37235, USA.
¹⁸ Center for Structural Biology, Vanderbilt University, Nashville, TN, 37235, USA.
¹⁹ Institute for Drug Discovery, Medical School, Leipzig University, 04103, Leipzig, Germany.
²⁰ Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093, Warsaw, Poland.
²¹ Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, 94158, USA.
²² Biophysics Graduate Program, University of California San Francisco, San Francisco, CA, 94158, USA.
²³ Program in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
²⁴ Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, 10010, USA.
²⁵ Department of Physiology and Membrane Biology, School of Medicine, University of California, Davis, CA, 95616, USA.
²⁶ Chemical and Physical Biology Program, Vanderbilt University, Nashville, TN, 37235, USA.
²⁷ Department of Bioochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA.
²⁸ Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200, Copenhagen N., Denmark.
²⁹ Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, 94305, USA.
³⁰ Institute for Protein Innovation, Boston, MA, 02115, USA.
³¹ Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, 02115, USA.
³² Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA.
³³ Department of Computer Science, Dartmouth, Hanover, NH, 03755, USA.
³⁴ Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA. jgray@jhu.edu.
³⁵ Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, 10010, USA. bonneau@nyu.edu.
³⁶ Department of Biology, New York University, New York, NY, 10003, USA. bonneau@nyu.edu.
³⁷ Department of Computer Science, New York University, New York, NY, 10003, USA. bonneau@nyu.edu.

^# Contributed equally.

Abstract

Each year vast international resources are wasted on irreproducible research. The scientific community has been slow to adopt standard software engineering practices, despite the increases in high-dimensional data, complexities of workflows, and computational environments. Here we show how scientific software applications can be created in a reproducible manner when simple design goals for reproducibility are met. We describe the implementation of a test server framework and 40 scientific benchmarks, covering numerous applications in Rosetta bio-macromolecular modeling. High performance computing cluster integration allows these benchmarks to run continuously and automatically. Detailed protocol captures are useful for developers and users of Rosetta and other macromolecular modeling tools. The framework and design concepts presented here are valuable for developers and users of any type of scientific software and for the scientific community to create reproducible methods. Specific examples highlight the utility of this framework, and the comprehensive documentation illustrates the ease of adding new tests in a matter of hours.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Benchmarking
Binding Sites
Humans
Ligands
Macromolecular Substances / chemistry*
Macromolecular Substances / metabolism
Molecular Docking Simulation*
Protein Binding
Proteins / chemistry*
Proteins / metabolism
Reproducibility of Results
Software / standards*

Substances

Ligands
Macromolecular Substances
Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding