Random Forests Based Group Importance Scores and Their Statistical Interpretation: Application for Alzheimer's Disease

Marie Wehenkel; Antonio Sutera; Christine Bastin; Pierre Geurts; Christophe Phillips

doi:10.3389/fnins.2018.00411

Random Forests Based Group Importance Scores and Their Statistical Interpretation: Application for Alzheimer's Disease

Front Neurosci. 2018 Jun 29:12:411. doi: 10.3389/fnins.2018.00411. eCollection 2018.

Authors

Marie Wehenkel^{1

2}, Antonio Sutera¹, Christine Bastin³, Pierre Geurts¹, Christophe Phillips^{1

2}

Affiliations

¹ Department of Computer Science and Electrical Engineering, Montefiore Institute, University of Liège, Liège, Belgium.
² GIGA-CRC in silico Medicine, University of Liège, Liège, Belgium.
³ GIGA-CRC in vivo Imaging, University of Liège, Liège, Belgium.

Abstract

Machine learning approaches have been increasingly used in the neuroimaging field for the design of computer-aided diagnosis systems. In this paper, we focus on the ability of these methods to provide interpretable information about the brain regions that are the most informative about the disease or condition of interest. In particular, we investigate the benefit of group-based, instead of voxel-based, analyses in the context of Random Forests. Assuming a prior division of the voxels into non overlapping groups (defined by an atlas), we propose several procedures to derive group importances from individual voxel importances derived from Random Forests models. We then adapt several permutation schemes to turn group importance scores into more interpretable statistical scores that allow to determine the truly relevant groups in the importance rankings. The good behaviour of these methods is first assessed on artificial datasets. Then, they are applied on our own dataset of FDG-PET scans to identify the brain regions involved in the prognosis of Alzheimer's disease.

Keywords: Alzheimer's disease; FDG-PET; feature selection; group-based method; machine learning; prognosis system; random forests.