Benchmarking Perturbation-Based Saliency Maps for Explaining Atari Agents
- PMID: 35910188
- PMCID: PMC9326049
- DOI: 10.3389/frai.2022.903875
Benchmarking Perturbation-Based Saliency Maps for Explaining Atari Agents
Abstract
One of the most prominent methods for explaining the behavior of Deep Reinforcement Learning (DRL) agents is the generation of saliency maps that show how much each pixel attributed to the agents' decision. However, there is no work that computationally evaluates and compares the fidelity of different perturbation-based saliency map approaches specifically for DRL agents. It is particularly challenging to computationally evaluate saliency maps for DRL agents since their decisions are part of an overarching policy, which includes long-term decision making. For instance, the output neurons of value-based DRL algorithms encode both the value of the current state as well as the expected future reward after doing each action in this state. This ambiguity should be considered when evaluating saliency maps for such agents. In this paper, we compare five popular perturbation-based approaches to create saliency maps for DRL agents trained on four different Atari 2,600 games. The approaches are compared using two computational metrics: dependence on the learned parameters of the underlying deep Q-network of the agents (sanity checks) and fidelity to the agents' reasoning (input degradation). During the sanity checks, we found that a popular noise-based saliency map approach for DRL agents shows little dependence on the parameters of the output layer. We demonstrate that this can be fixed by tweaking the algorithm such that it focuses on specific actions instead of the general entropy within the output values. For fidelity, we identify two main factors that influence which saliency map approach should be chosen in which situation. Particular to value-based DRL agents, we show that analyzing the agents' choice of action requires different saliency map approaches than analyzing the agents' state value estimation.
Keywords: deep reinforcement learning; explainable artificial intelligence (XAI); explainable reinforcement learning; feature attribution; interpretable machine learning; saliency maps.
Copyright © 2022 Huber, Limmer and André.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures
Similar articles
-
Achieving efficient interpretability of reinforcement learning via policy distillation and selective input gradient regularization.Neural Netw. 2023 Apr;161:228-241. doi: 10.1016/j.neunet.2023.01.025. Epub 2023 Jan 24. Neural Netw. 2023. PMID: 36774862
-
STACoRe: Spatio-temporal and action-based contrastive representations for reinforcement learning in Atari.Neural Netw. 2023 Mar;160:1-11. doi: 10.1016/j.neunet.2022.12.018. Epub 2022 Dec 29. Neural Netw. 2023. PMID: 36587439
-
Predictive hierarchical reinforcement learning for path-efficient mapless navigation with moving target.Neural Netw. 2023 Aug;165:677-688. doi: 10.1016/j.neunet.2023.06.007. Epub 2023 Jun 10. Neural Netw. 2023. PMID: 37385022
-
Deep reinforcement learning and its applications in medical imaging and radiation therapy: a survey.Phys Med Biol. 2022 Nov 11;67(22). doi: 10.1088/1361-6560/ac9cb3. Phys Med Biol. 2022. PMID: 36270582 Review.
-
Deep reinforcement learning in medical imaging: A literature review.Med Image Anal. 2021 Oct;73:102193. doi: 10.1016/j.media.2021.102193. Epub 2021 Jul 27. Med Image Anal. 2021. PMID: 34371440 Review.
Cited by
-
Evaluation of deep learning-based feature selection for single-cell RNA sequencing data analysis.Genome Biol. 2023 Nov 10;24(1):259. doi: 10.1186/s13059-023-03100-x. Genome Biol. 2023. PMID: 37950331 Free PMC article.
-
When neuro-robots go wrong: A review.Front Neurorobot. 2023 Feb 3;17:1112839. doi: 10.3389/fnbot.2023.1112839. eCollection 2023. Front Neurorobot. 2023. PMID: 36819005 Free PMC article. Review.
References
-
- Adebayo J., Gilmer J., Muelly M., Goodfellow I., Hardt M., Kim B. (2018). “Sanity checks for saliency maps,” in Advances in Neural Information Processing Systems (Montreal, QC: ), 9505–9515.
-
- Amir D., Amir O. (2018). “HIGHLIGHTS: summarizing agent behavior to people,” in AAMAS (Stockholm: ), 1168–1176.
-
- Ancona M., Ceolini E., Öztireli C., Gross M. (2018). “Towards better understanding of gradient-based attribution methods for deep neural networks,” in ICLR (Vancouver, BC: ).
-
- Anderson A., Dodge J., Sadarangani A., Juozapaitis Z., Newman E., Irvine J., et al. . (2019). “Explaining reinforcement learning to mere mortals: an empirical study,” in IJCAI (Macao: ), 1328–1334.
-
- Atrey A., Clary K., Jensen D. (2020). “Exploratory not explanatory: Counterfactual analysis of saliency maps for deep reinforcement learning,” in ICLR (Addis Ababa: ).
LinkOut - more resources
Full Text Sources
