Multiple model-based reinforcement learning explains dopamine neuronal activity

Neural Netw. 2007 Aug;20(6):668-75. doi: 10.1016/j.neunet.2007.04.028. Epub 2007 Jun 6.

Abstract

A number of computational models have explained the behavior of dopamine neurons in terms of temporal difference learning. However, earlier models cannot account for recent results of conditioning experiments; specifically, the behavior of dopamine neurons in case of variation of the interval between a cue stimulus and a reward has not been satisfyingly accounted for. We address this problem by using a modular architecture, in which each module consists of a reward predictor and a value estimator. A "responsibility signal", computed from the accuracy of the predictions of the reward predictors, is used to weight the contributions and learning of the value estimators. This multiple-model architecture gives an accurate account of the behavior of dopamine neurons in two specific experiments: when the reward is delivered earlier than expected, and when the stimulus-reward interval varies uniformly over a fixed range.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Computer Simulation
  • Dopamine / metabolism*
  • Models, Neurological*
  • Neurons / physiology*
  • Reinforcement, Psychology*

Substances

  • Dopamine