Model-based reinforcement learning enables an agent to learn in variable environments and tasks by optimizing its actions based on the predicted states and outcomes. This mechanism has also been considered in the brain. However, exactly how the brain selects an appropriate model for confronting environments has remained unclear. Here, we investigated the model selection algorithm in the human brain during a reinforcement learning task. One primary theory of model selection in the brain is based on sensory prediction errors. Here, we compared this theory with an alternative possibility of internal model selection with reward prediction errors. To compare these two theories, we devised a switching experiment from a first-order Markov decision process to a second-order Markov decision process that provides either reward- or sensory prediction error regarding environmental change. We tested two representative computational models driven by different prediction errors. One is the sensory prediction-error-driven Bayesian algorithm, which has been discussed as a representative internal model selection algorithm in the animal reinforcement learning task. The other is the reward-prediction-error-driven policy gradient algorithm. We compared the simulation results of these two computational models with human reinforcement learning behaviors. The model fitting result supports that the policy gradient algorithm is preferable to the Bayesian algorithm. This suggests that the human brain employs the reward prediction error to select an appropriate internal model in the reinforcement learning task.
Keywords: Bayesian; Internal model; Model-based; Policy gradient; Reinforcement learning.
Copyright © 2022 The Author(s). Published by Elsevier Ltd.. All rights reserved.