Making complex prediction rules applicable for readers: Current practice in random forest literature and recommendations

Anne-Laure Boulesteix; Silke Janitza; Roman Hornung; Philipp Probst; Hannah Busen; Alexander Hapfelmeier

doi:10.1002/bimj.201700243

Making complex prediction rules applicable for readers: Current practice in random forest literature and recommendations

Biom J. 2019 Sep;61(5):1314-1328. doi: 10.1002/bimj.201700243. Epub 2018 Aug 1.

Authors

Anne-Laure Boulesteix¹, Silke Janitza¹, Roman Hornung¹, Philipp Probst¹, Hannah Busen¹, Alexander Hapfelmeier²

Affiliations

¹ Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany.
² Institute for Medical Informatics, Statistics and Epidemiology, TUM Munich, Munich, Germany.

PMID: 30069934
DOI: 10.1002/bimj.201700243

Abstract

Ideally, prediction rules should be published in such a way that readers may apply them, for example, to make predictions for their own data. While this is straightforward for simple prediction rules, such as those based on the logistic regression model, this is much more difficult for complex prediction rules derived by machine learning tools. We conducted a survey of articles reporting prediction rules that were constructed using the random forest algorithm and published in PLOS ONE in 2014-2015 in the field "medical and health sciences", with the aim of identifying issues related to their applicability. Making a prediction rule reproducible is a possible way to ensure that it is applicable; thus reproducibility is also examined in our survey. The presented prediction rules were applicable in only 2 of 30 identified papers, while for further eight prediction rules it was possible to obtain the necessary information by contacting the authors. Various problems, such as nonresponse of the authors, hampered the applicability of prediction rules in the other cases. Based on our experiences from this illustrative survey, we formulate a set of recommendations for authors who aim to make complex prediction rules applicable for readers. All data including the description of the considered studies and analysis codes are available as supplementary materials.

Keywords: logistic regression; machine learning; prediction rule; reproducibility; reproducible research.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Biometry / methods*
Medicine
Science
Software