Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2014 Jul 23:5:4390.
doi: 10.1038/ncomms5390.

Action-value comparisons in the dorsolateral prefrontal cortex control choice between goal-directed actions

Affiliations
Comparative Study

Action-value comparisons in the dorsolateral prefrontal cortex control choice between goal-directed actions

Richard W Morris et al. Nat Commun. .

Abstract

It is generally assumed that choice between different actions reflects the difference between their action values yet little direct evidence confirming this assumption has been reported. Here we assess whether the brain calculates the absolute difference between action values or their relative advantage, that is, the probability that one action is better than the other alternatives. We use a two-armed bandit task during functional magnetic resonance imaging and modelled responses to determine both the size of the difference between action values (D) and the probability that one action value is better (P). The results show haemodynamic signals corresponding to P in right dorsolateral prefrontal cortex (dlPFC) together with evidence that these signals modulate motor cortex activity in an action-specific manner. We find no significant activity related to D. These findings demonstrate that a distinct neuronal population mediates action-value comparisons, and reveals how these comparisons are implemented to mediate value-based decision-making.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Experimental stimuli, behavioural choices and causal ratings.
(a) Before the choice, no stimuli indicated which button was more likely to lead to reward. When the participant made a choice, the button chosen was highlighted (green) and on rewarded trials the reward stimulus was presented for 1,000 ms duration. After each block of trials, the participant rated how causal each button was. (b) Mean response rate (responses per second) was higher for the high-contingency action (blue) over low-contingency action (red) in each condition. (c) Causal ratings were higher for the high-contingency action (blue) over low-contingency action (red) in each condition. Response rate and causal rating significantly varied with contingency, P<0.001. Vertical bars represent s.e.m.
Figure 2
Figure 2. Model values P and D predict choices.
(a) Trial-by-trial example of the actual choices made by Subject 7 (black vertical bars: left actions upward, right actions downwards), and the model-predicted values in arbitrary units for P (red) and D (blue) across the first four blocks (from easy to hard). Notice P represents a sustained advantage across each block, while D decays towards the experimental contingency in each block. (b) The regression weights of P (red) and D (blue) values across tertile bins of D values showing that as the difference in QLeft and QRight approaches zero (middle tertile of D values) only P values significantly predict choice. (c) Regression weights of P and D across tertile bins of P values showing that P and D are both significant predictors of choice across all tertiles of P.
Figure 3
Figure 3. Right dlPFC tracked the relative advantage signal.
(a) Cortical regions correlated with the relative advantage signal (P). Only the right dlPFC (BA9) was significant FWEc P<.05. Inset, per cent signal change in the right dlPFC was linearly related to P. (b) Fitted responses in arbitrary units showing action-specific modulation of brain activity (red and blue) by P, as well as non-specific activity due to left and right actions (button presses) in the right dlPFC.
Figure 4
Figure 4. Ventromedial PFC tracked post-choice values.
(a) Peak voxel in the medial orbitofrontal cortex region correlated with the chosen action value (expected reward). (b) Peak voxel in the ventromedial prefrontal cortex correlated with the unchosen action value.
Figure 5
Figure 5. Right dLPFC modulated motor cortex.
(a) Probability that one model is more likely than any other model. Inset, winning model with dlPFC modulating motor cortex activity in an action-specific manner (b) How likely a specific model generated the data of a random subject.

Similar articles

Cited by

References

    1. Gittins J. C. Bandit processes and dynamic allocation indices. J. R. Statist. Soc. B 148–177 (1979).
    1. Monica B. & Tze Leung L. Incomplete learning from endogenous data in dynamic allocation. Econometrica 68, 1511–1516 (2000).
    1. Scott S. L. A modern Bayesian look at the multi-armed bandit. Appl. Stochastic Models Bus. Ind. 26, 639–658 (2010).
    1. Dickinson A. & Balleine B. Motivational control of instrumental action. Curr. Dir. Psychol. Sci. 4, 162–167 (1995).
    1. Platt M. & Glimcher P. Neural correlates of decision variables in parietal cortex. Nature 400, 233–238 (1999). - PubMed

Publication types

LinkOut - more resources