Dopamine neurons learn to encode the long-term value of multiple future rewards
- PMID: 21896766
- PMCID: PMC3174584
- DOI: 10.1073/pnas.1014457108
Dopamine neurons learn to encode the long-term value of multiple future rewards
Abstract
Midbrain dopamine neurons signal reward value, their prediction error, and the salience of events. If they play a critical role in achieving specific distant goals, long-term future rewards should also be encoded as suggested in reinforcement learning theories. Here, we address this experimentally untested issue. We recorded 185 dopamine neurons in three monkeys that performed a multistep choice task in which they explored a reward target among alternatives and then exploited that knowledge to receive one or two additional rewards by choosing the same target in a set of subsequent trials. An analysis of anticipatory licking for reward water indicated that the monkeys did not anticipate an immediately expected reward in individual trials; rather, they anticipated the sum of immediate and multiple future rewards. In accordance with this behavioral observation, the dopamine responses to the start cues and reinforcer beeps reflected the expected values of the multiple future rewards and their errors, respectively. More specifically, when monkeys learned the multistep choice task over the course of several weeks, the responses of dopamine neurons encoded the sum of the immediate and expected multiple future rewards. The dopamine responses were quantitatively predicted by theoretical descriptions of the value function with time discounting in reinforcement learning. These findings demonstrate that dopamine neurons learn to encode the long-term value of multiple future rewards with distant rewards discounted.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
Similar articles
-
A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6. Neuroscience. 1999. PMID: 10391468
-
Midbrain dopamine neurons encode a quantitative reward prediction error signal.Neuron. 2005 Jul 7;47(1):129-41. doi: 10.1016/j.neuron.2005.05.020. Neuron. 2005. PMID: 15996553 Free PMC article.
-
Predictive reward signal of dopamine neurons.J Neurophysiol. 1998 Jul;80(1):1-27. doi: 10.1152/jn.1998.80.1.1. J Neurophysiol. 1998. PMID: 9658025 Review.
-
Coding of the long-term value of multiple future rewards in the primate striatum.J Neurophysiol. 2013 Feb;109(4):1140-51. doi: 10.1152/jn.00289.2012. Epub 2012 Nov 21. J Neurophysiol. 2013. PMID: 23175806
-
Dopamine signals for reward value and risk: basic and recent data.Behav Brain Funct. 2010 Apr 23;6:24. doi: 10.1186/1744-9081-6-24. Behav Brain Funct. 2010. PMID: 20416052 Free PMC article. Review.
Cited by
-
A Transient Dopamine Signal Represents Avoidance Value and Causally Influences the Demand to Avoid.eNeuro. 2018 May 15;5(2):ENEURO.0058-18.2018. doi: 10.1523/ENEURO.0058-18.2018. eCollection 2018 Mar-Apr. eNeuro. 2018. PMID: 29766047 Free PMC article.
-
Neural Circuitry of Reward Prediction Error.Annu Rev Neurosci. 2017 Jul 25;40:373-394. doi: 10.1146/annurev-neuro-072116-031109. Epub 2017 Apr 24. Annu Rev Neurosci. 2017. PMID: 28441114 Free PMC article. Review.
-
Age Differences in Striatal Delay Sensitivity during Intertemporal Choice in Healthy Adults.Front Neurosci. 2011 Nov 16;5:126. doi: 10.3389/fnins.2011.00126. eCollection 2011. Front Neurosci. 2011. PMID: 22110424 Free PMC article.
-
Learning to represent reward structure: a key to adapting to complex environments.Neurosci Res. 2012 Dec;74(3-4):177-83. doi: 10.1016/j.neures.2012.09.007. Epub 2012 Oct 13. Neurosci Res. 2012. PMID: 23069349 Free PMC article.
-
Hypotheses relating to the function of the claustrum.Front Integr Neurosci. 2012 Aug 2;6:53. doi: 10.3389/fnint.2012.00053. eCollection 2012. Front Integr Neurosci. 2012. PMID: 22876222 Free PMC article.
References
-
- Sutton RS, Barto AG. Reinforcement Learning. Cambridge, MA: MIT Press; 1998.
-
- Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science. 2003;299:1898–1902. - PubMed
-
- Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H. Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron. 2004;43:133–143. - PubMed
-
- Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H. Midbrain dopamine neurons encode decisions for future action. Nat Neurosci. 2006;9:1057–1063. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
