Reinforcement learning: computing the temporal difference of values via distinct corticostriatal pathways

Trends Neurosci. 2012 Aug;35(8):457-67. doi: 10.1016/j.tins.2012.04.009. Epub 2012 May 30.


Midbrain dopamine neurons supposedly encode reward prediction error, but how error signals are computed remains elusive. Here, we propose a mechanism based on recent findings regarding corticostriatal circuits. Specifically, we propose that two distinct subpopulations of corticostriatal neurons differentially represent the animal's current and previous states/actions through unidirectional connectivity from one subpopulation to the other and strong recurrent excitation that exists only within the recipient subpopulation. These corticostriatal subpopulations selectively connect to the direct and indirect pathways of the basal ganglia, such that the temporal difference between the values of current and previous states/actions--the core of the error signal--can be computed. Our hypothesis suggests a unified view of basal ganglia functions and has important clinical implications.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Brain / physiology*
  • Humans
  • Learning / physiology*
  • Models, Neurological*
  • Neural Pathways / physiology*
  • Reinforcement, Psychology*