Evaluating Explanations from AI Algorithms for Clinical Decision-Making: A Social Science-based Approach

IEEE J Biomed Health Inform. 2024 Apr 25:PP. doi: 10.1109/JBHI.2024.3393719. Online ahead of print.

Abstract

Explainable Artificial Intelligence (XAI) techniques generate explanations for predictions from AI models. These explanations can be evaluated for (i) faithfulness to the prediction, i.e., its correctness about the reasons for prediction, and (ii) usefulness to the user. While there are metrics to evaluate faithfulness, to our knowledge, there are no automated metrics to evaluate the usefulness of explanations in the clinical context. Our objective is to develop a new metric to evaluate usefulness of AI explanations to clinicians. Usefulness evaluation needs to consider both (a) how humans generally process explanations and (b) clinicians' specific requirements from explanations presented by clinical decision support systems (CDSS). Our new scoring method can evaluate the usefulness of explanations generated by any XAI method that provides importance values for the input features of the prediction model. Our method draws on theories from social science to gauge usefulness, and uses literature-derived biomedical knowledge graphs to quantify support for the explanations from clinical literature. We evaluate our method in a case study on predicting onset of sepsis in intensive care units. Our analysis shows that the scores obtained using our method corroborate with independent evidence from clinical literature and have the required qualities expected from such a metric. Thus, our method can be used to evaluate and select useful explanations from a diverse set of XAI techniques in clinical contexts, making it a fundamental tool for future research in the design of AI-driven CDSS.