Reducing contributions from non-neuronal sources is a crucial step in functional magnetic resonance imaging (fMRI) connectivity analyses. Many viable strategies for denoising fMRI are used in the literature, and practitioners rely on denoising benchmarks for guidance in the selection of an appropriate choice for their study. However, fMRI denoising software is an ever-evolving field, and the benchmarks can quickly become obsolete as the techniques or implementations change. In this work, we present a denoising benchmark featuring a range of denoising strategies, datasets and evaluation metrics for connectivity analyses, based on the popular fMRIprep software. The benchmark is implemented in a fully reproducible framework, where the provided research objects enable readers to reproduce or modify core computations, as well as the figures of the article using the Jupyter Book project and the Neurolibre reproducible preprint server (https://neurolibre.org/). We demonstrate how such a reproducible benchmark can be used for continuous evaluation of research software, by comparing two versions of the fMRIprep software package. The majority of benchmark results were consistent with prior literature. Scrubbing, a technique which excludes time points with excessive motion, combined with global signal regression, is generally effective at noise removal. Scrubbing however disrupts the continuous sampling of brain images and is incompatible with some statistical analyses, e.g. auto-regressive modeling. In this case, a simple strategy using motion parameters, average activity in select brain compartments, and global signal regression should be preferred. Importantly, we found that certain denoising strategies behave inconsistently across datasets and/or versions of fMRIPrep, or had a different behavior than in previously published benchmarks. This work will hopefully provide useful guidelines for the fMRIprep users community, and highlight the importance of continuous evaluation of research methods. Our reproducible benchmark infrastructure will facilitate such continuous evaluation in the future, and may also be applied broadly to different tools or even research fields.
Keywords: Nilearn; fMRIPrep; functional connectivity; nuisance regressor; reproducibility; resting-state fMRI.