The asymmetric learning rates of murine exploratory behavior in sparse reward environments

Hiroyuki Ohta; Kuniaki Satori; Yu Takarada; Masashi Arake; Toshiaki Ishizuka; Yuji Morimoto; Tatsuji Takahashi

doi:10.1016/j.neunet.2021.05.030

The asymmetric learning rates of murine exploratory behavior in sparse reward environments

Neural Netw. 2021 Nov:143:218-229. doi: 10.1016/j.neunet.2021.05.030. Epub 2021 Jun 6.

Authors

Hiroyuki Ohta¹, Kuniaki Satori², Yu Takarada², Masashi Arake³, Toshiaki Ishizuka⁴, Yuji Morimoto³, Tatsuji Takahashi²

Affiliations

¹ Department of Pharmacology, National Defense Medical College, Saitama, 359-8513, Japan. Electronic address: ohta@ndmc.ac.jp.
² Tokyo Denki University, Saitama, 350-0394, Japan.
³ Department of Physiology, National Defense Medical College, Saitama, 359-8513, Japan.
⁴ Department of Pharmacology, National Defense Medical College, Saitama, 359-8513, Japan.

PMID: 34157646
DOI: 10.1016/j.neunet.2021.05.030

Abstract

Goal-oriented behaviors of animals can be modeled by reinforcement learning algorithms. Such algorithms predict future outcomes of selected actions utilizing action values and updating those values in response to the positive and negative outcomes. In many models of animal behavior, the action values are updated symmetrically based on a common learning rate, that is, in the same way for both positive and negative outcomes. However, animals in environments with scarce rewards may have uneven learning rates. To investigate the asymmetry in learning rates in reward and non-reward, we analyzed the exploration behavior of mice in five-armed bandit tasks using a Q-learning model with differential learning rates for positive and negative outcomes. The positive learning rate was significantly higher in a scarce reward environment than in a rich reward environment, and conversely, the negative learning rate was significantly lower in the scarce environment. The positive to negative learning rate ratio was about 10 in the scarce environment and about 2 in the rich environment. This result suggests that when the reward probability was low, the mice tend to ignore failures and exploit the rare rewards. Computational modeling analysis revealed that the increased learning rates ratio could cause an overestimation of and perseveration on rare-rewarding events, increasing total reward acquisition in the scarce environment but disadvantaging impartial exploration.

Keywords: Behavior; Dual learning rate; Exploration; Multi-armed bandit problem; Reinforcement learning.

MeSH terms

Algorithms
Animals
Exploratory Behavior*
Mice
Probability
Reinforcement, Psychology
Reward*