Signals in human striatum are appropriate for policy update rather than value prediction
- PMID: 21471387
- PMCID: PMC3132551
- DOI: 10.1523/JNEUROSCI.6316-10.2011
Signals in human striatum are appropriate for policy update rather than value prediction
Abstract
Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.
Figures
Similar articles
-
Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning.Neural Netw. 2006 Oct;19(8):1242-54. doi: 10.1016/j.neunet.2006.06.007. Epub 2006 Sep 20. Neural Netw. 2006. PMID: 16987637
-
Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.J Neurosci. 2007 Nov 21;27(47):12860-7. doi: 10.1523/JNEUROSCI.2496-07.2007. J Neurosci. 2007. PMID: 18032658 Free PMC article.
-
Overlapping prediction errors in dorsal striatum during instrumental learning with juice and money reward in the human brain.J Neurophysiol. 2009 Dec;102(6):3384-91. doi: 10.1152/jn.91195.2008. Epub 2009 Sep 30. J Neurophysiol. 2009. PMID: 19793875
-
Neural basis of reinforcement learning and decision making.Annu Rev Neurosci. 2012;35:287-308. doi: 10.1146/annurev-neuro-062111-150512. Epub 2012 Mar 29. Annu Rev Neurosci. 2012. PMID: 22462543 Free PMC article. Review.
-
Neuronal Reward and Decision Signals: From Theories to Data.Physiol Rev. 2015 Jul;95(3):853-951. doi: 10.1152/physrev.00023.2014. Physiol Rev. 2015. PMID: 26109341 Free PMC article. Review.
Cited by
-
Normative development of ventral striatal resting state connectivity in humans.Neuroimage. 2015 Sep;118:422-37. doi: 10.1016/j.neuroimage.2015.06.022. Epub 2015 Jun 16. Neuroimage. 2015. PMID: 26087377 Free PMC article.
-
The Effect of Counterfactual Information on Outcome Value Coding in Medial Prefrontal and Cingulate Cortex: From an Absolute to a Relative Neural Code.J Neurosci. 2020 Apr 15;40(16):3268-3277. doi: 10.1523/JNEUROSCI.1712-19.2020. Epub 2020 Mar 10. J Neurosci. 2020. PMID: 32156831 Free PMC article.
-
Reinforcement learning and dopamine in schizophrenia: dimensions of symptoms or specific features of a disease group?Front Psychiatry. 2013 Dec 23;4:172. doi: 10.3389/fpsyt.2013.00172. Front Psychiatry. 2013. PMID: 24391603 Free PMC article. Review.
-
Learning to obtain reward, but not avoid punishment, is affected by presence of PTSD symptoms in male veterans: empirical data and computational model.PLoS One. 2013 Aug 27;8(8):e72508. doi: 10.1371/journal.pone.0072508. eCollection 2013. PLoS One. 2013. PMID: 24015254 Free PMC article. Clinical Trial.
-
Distinct Action Signals by Subregions in the Nucleus Accumbens during STOP-Change Performance.J Neurosci. 2024 Jul 17;44(29):e0020242024. doi: 10.1523/JNEUROSCI.0020-24.2024. J Neurosci. 2024. PMID: 38897724 Free PMC article.
References
-
- Barraclough DJ, Conroy ML, Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nat Neurosci. 2004;7:404–410. - PubMed
-
- Barto AG. Adaptive critics and the basal ganglia. In: Houk JC, Davis J, Beiser D, editors. Models of information processing in the basal ganglia. Cambridge, MA: MIT; 1995. pp. 215–232.
-
- Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an uncertain world. Nat Neurosci. 2007;10:1214–1221. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources