Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr 6;31(14):5504-11.
doi: 10.1523/JNEUROSCI.6316-10.2011.

Signals in human striatum are appropriate for policy update rather than value prediction

Affiliations

Signals in human striatum are appropriate for policy update rather than value prediction

Jian Li et al. J Neurosci. .

Abstract

Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Experimental design. A, Timeline of a single trial. B, Example from one subject of changing reward probabilities for both slots. ITI, Intertrial interval.
Figure 2.
Figure 2.
A, Neural correlates of Rc − κRu. B, C, Overlapping view of the neural correlates of prediction errors of chosen choices (B, δchosen and −δchosen) and forgone choices (C, δunchosen and −δunchosen). p < 0.05.
Figure 3.
Figure 3.
A, Effect of decision variables in striatum. BOLD activity positively correlates with Rc and −Ru (p < 0.05) but not with −Qc or Qu, even at a loose threshold [p = 0.01, uncorrected (unc)]. The dotted boxes surround the pairs of effects expected to be significant for value-based learning [RcQc and −(RuQu)] and the solid box surrounds those for policy learning (Rc − κRu). B, Similar activities (bilateral insula, anterior cingulate cortex, and dorsolateral prefrontal cortex) were positively correlated with −Rc and Ru (p < 0.05).
Figure 4.
Figure 4.
A, Error signaling ROI in left ventral striatum, identified from the conjunction of Rc and −Ru across subjects [p < 0.001, uncorrected (unc)]. B, In left striatum (A, circles), the neural effect size for −Ru was positively correlated, across subjects, with the weight for the unchosen reward, κ, estimated from choice behavior (p = 0.018, Bonferroni corrected; r = 0.57).

Similar articles

Cited by

References

    1. Barraclough DJ, Conroy ML, Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nat Neurosci. 2004;7:404–410. - PubMed
    1. Barto AG. Adaptive critics and the basal ganglia. In: Houk JC, Davis J, Beiser D, editors. Models of information processing in the basal ganglia. Cambridge, MA: MIT; 1995. pp. 215–232.
    1. Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an uncertain world. Nat Neurosci. 2007;10:1214–1221. - PubMed
    1. Behrens TE, Hunt LT, Woolrich MW, Rushworth MF. Associative learning of social value. Nature. 2008;456:245–249. - PMC - PubMed
    1. Berns GS, McClure SM, Pagnoni G, Montague PR. Predictability modulates human brain response to reward. J Neurosci. 2001;21:2793–2798. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources