Background: Explainability, the aspect of artificial intelligence-based decision support (ADS) systems that allows users to understand why predictions are made, offers many potential benefits. One common claim is that explainability increases user trust, yet this has not been established in healthcare contexts. For advanced algorithms such as artificial neural networks, the generation of explanations is not trivial, but requires the use of a second algorithm. The assumption of improved user trust should therefore be investigated to determine if it justifies the additional complexity.
Methods: Biochemistry staff completed a wrong blood in tube (WBIT) error identification task with the help of an ADS system. One-half of the volunteers were provided with both ADS predictions and explanations for those predictions, while the other half received predictions alone. The two groups were compared in terms of their rate of agreement with ADS predictions, as an index of user trust, and WBIT error detection performance. Since the AI model used to generate predictions was known to out-perform laboratory staff, increased trust was expected to improve user performance.
Results: Volunteers reviewed 1590 sets of results. The volunteers provided with explanations demonstrated no difference in their rate of agreement with the ADS system compared to volunteers receiving predictions alone (83.3% versus 81.8%, p = 0.46). The two volunteer groups were also equivalent in accuracy, sensitivity and specificity for WBIT error identification (p-values >0.78).
Conclusions: For a WBIT error identification task, there was no evidence to justify the additional complexity of explainability on the grounds of increased user trust.
Keywords: Wrong blood in tube; artificial intelligence; decision support; machine learning; sample mislabelling.