Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Aug 12;35(32):11233-51.
doi: 10.1523/JNEUROSCI.0396-15.2015.

The Good, the Bad, and the Irrelevant: Neural Mechanisms of Learning Real and Hypothetical Rewards and Effort

Affiliations

The Good, the Bad, and the Irrelevant: Neural Mechanisms of Learning Real and Hypothetical Rewards and Effort

Jacqueline Scholl et al. J Neurosci. .

Abstract

Natural environments are complex, and a single choice can lead to multiple outcomes. Agents should learn which outcomes are due to their choices and therefore relevant for future decisions and which are stochastic in ways common to all choices and therefore irrelevant for future decisions between options. We designed an experiment in which human participants learned the varying reward and effort magnitudes of two options and repeatedly chose between them. The reward associated with a choice was randomly real or hypothetical (i.e., participants only sometimes received the reward magnitude associated with the chosen option). The real/hypothetical nature of the reward on any one trial was, however, irrelevant for learning the longer-term values of the choices, and participants ought to have only focused on the informational content of the outcome and disregarded whether it was a real or hypothetical reward. However, we found that participants showed an irrational choice bias, preferring choices that had previously led, by chance, to a real reward in the last trial. Amygdala and ventromedial prefrontal activity was related to the way in which participants' choices were biased by real reward receipt. By contrast, activity in dorsal anterior cingulate cortex, frontal operculum/anterior insula, and especially lateral anterior prefrontal cortex was related to the degree to which participants resisted this bias and chose effectively in a manner guided by aspects of outcomes that had real and more sustained relationships with particular choices, suppressing irrelevant reward information for more optimal learning and decision making.

Significance statement: In complex natural environments, a single choice can lead to multiple outcomes. Human agents should only learn from outcomes that are due to their choices, not from outcomes without such a relationship. We designed an experiment to measure learning about reward and effort magnitudes in an environment in which other features of the outcome were random and had no relationship with choice. We found that, although people could learn about reward magnitudes, they nevertheless were irrationally biased toward repeating certain choices as a function of the presence or absence of random reward features. Activity in different brain regions in the prefrontal cortex either reflected the bias or reflected resistance to the bias.

Keywords: effort; frontal pole; hypothetical; learning; reward; vmPFC.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Task description. A, In the decision phase, participants were shown two options (i.e., choices), overlaid with the probability of receiving a reward for each choice. They could only decide after an initial monitoring phase (1.4–4.5 s). The chosen option was then highlighted for 2.9–8 s. B, In the following outcome phase, participants saw the outcome for the chosen option first (1.9–2.1 s). The reward magnitude was shown as a purple bar (top of the screen); the effort magnitude was indicated through the position of a dial on a circle. Whether they received a reward was indicated by a tick mark (real reward, top display) or a red crossed-out sign over the reward magnitude (hypothetical reward, bottom display). If a reward was real, the reward was also added to a status bar at the bottom of the screen, which tracked rewards over the course of the experiment. A reminder of what option they had chosen was shown at the top of the screen. Then the reward and effort magnitudes were shown for the unchosen option (1.9–6.9 s). Finally, participants performed the effort phase (C) where the number of targets was equivalent to the chosen effort outcome. Importantly, participants had to perform the effort phase on every trial, independent of whether the reward was real or hypothetical. An example schedule is shown in D, with both the reward and effort magnitude values of the two choices.
Figure 2.
Figure 2.
Correlations between the regressors included in the fMRI designs. A, Correlations (r values) between regressors in GLM1. The values are the mean of the absolute r values across all participants. No r values exceeded 0.33. B, In GLM2, no r values exceeded 0.38. C, Chosen option; UC, unchosen option; PredRewMag, predicted reward magnitude; PredEffMag, predicted effort magnitude; hypoth., hypothetical outcome; RewMagOutcome/RewMagOutc, reward magnitude outcome; EffMagOutcome/EffMagOutc, effort magnitude outcome. *Events time-locked to the onset of the unchosen option's outcomes appearing on the screen.
Figure 3.
Figure 3.
Behavioral results. A, Distribution of the Bayesian estimated reward and effort magnitude differences (Option 1 − Option 2) of the two options on the trials used in the task. B, How likely participants were to select one option over the other based on the predicted reward and effort magnitude differences between the options. Decisions were analyzed using a regression analysis (C). Participants were more likely to stay with an option (choose it again) rather than switch to the alternative if the option was associated with a higher displayed probability (“prob”) and higher past (one [t − 1], two [t − 2], or three [t − 3] trials ago) reward and lower past effort magnitudes than the alternative option. Furthermore, participants were more likely to stay if they had received a real rather than a hypothetical reward on the last trial (p = 0.008, highlighted in red). Effort exertion was analyzed using a regression analysis (D) predicting the clicking rate. The regressors were the effort and reward magnitude outcomes, separately for the option participants had chosen (“C”) or not chosen (“UC,” unchosen), and the reward type (i.e., whether the reward was real or hypothetical). Again, participants' behavior was influenced by whether the reward was real or hypothetical (p = 0.039, highlighted in red).
Figure 4.
Figure 4.
Brain activations in the outcome phase. A, Increases in BOLD activity correlating with the relative effort magnitude outcomes (chosen − unchosen option; red) and relative reward magnitude outcomes (chosen − unchosen option; brown) in the outcome phase. At the same time, whether a reward was real or hypothetical (B) led to widespread increases in BOLD activity throughout the brain (pink). All activations are cluster-corrected at p < 0.05.
Figure 5.
Figure 5.
Correlations between the decision bias and the neural response to real versus hypothetical reward in the outcome phase. Participants who had a larger BOLD increase to real compared with hypothetical rewards in the vmPFC showed a larger behavioral bias (positive correlation; red). In contrast, participants who showed a larger activation to real compared with hypothetical rewards in the aPFC or the dACC showed a weaker behavioral bias (negative correlation; blue). All results are cluster corrected at p < 0.05. For illustration, we also show scatter plots of these correlations (B, C) using averages within spherical ROI's with a radius of 3 voxels in MNI space.
Figure 6.
Figure 6.
Time courses from selected regions showing the main effect of real versus hypothetical reward and how the coding of the relative reward and effort magnitude outcomes is affected by the reward being real. A–C, Locations of the ROIs. Relative effort magnitude outcomes (chosen − unchosen option) (D) led to a larger increase in BOLD when the reward was real rather than hypothetical in aPFC and FO/AI, but not in ventral striatum. Similarly, relative reward magnitudes (chosen − unchosen option) (E) led to a stronger decrease in BOLD when the reward was real rather than hypothetical in aPFC and FO/AI, but not in ventral striatum. F, Whether the reward was real or hypothetical not only led to an increase in BOLD in ventral striatum and vmPFC but also in the aPFC. D, E, Significance was based on the result of paired two-tailed t tests comparing the hemodynamically convolved time courses from trials on which the reward was real or hypothetical: *p < 0.05; **p < 0.01; ***p < 0.001. F, Significance tests were one-sample two-tailed t tests. All ROIs were selected on the basis of an orthogonal contrast; aPFC, FO/AI, and dACC ROIs were selected based on the whole-brain-corrected contrast-relative effort magnitude at the time of outcome (chosen option − unchosen option); the ventral striatum ROI was selected based on the whole-brain-corrected contrast-relative reward magnitude at time of outcome (chosen − unchosen option); the vmPFC ROI was selected based on the whole-brain-corrected contrast real versus hypothetical reward outcome. The ROIs were 3 voxels in radius in the case of all cortical regions (aPFC, FO/AI, dACC, and vmPFC) and 2 voxels in radius in the case of the subcortical region (ventral striatum).
Figure 7.
Figure 7.
Various signals are present in the aPFC during the outcome phase. Different signals are present in aPFC that are consistent with a role in overcoming a bias to stay with the current choice when there is real rather than hypothetical reward. A, First, aPFC activity increases when rewards are real (pink) and participants with a stronger increase are better at overcoming the behavioral bias (blue). B, Second, the representation of effort magnitude outcomes increases in aPFC when reward is real (yellow). We also show the relative effort magnitude outcome contrast (red) used to identify our aPFC ROI (used for time courses in Fig. 6).
Figure 8.
Figure 8.
Brain activations in the decision phase. Relative decision value (A), a linear contrast of the regressors for relative (chosen − unchosen option) predicted reward magnitude + shown reward probability − predicted effort magnitude, led to increases (orange) and decreases (blue) in BOLD at the time of the decision. Regions activated in this contrast are regions in which activity is covarying with the value information that ought, rationally, to guide decision-making. Importantly, we also found increases in BOLD (B) in the vmPFC (beige) when a real rather than a hypothetical reward had been received on the last trial. vmPFC activity, therefore, covaries with an outcome feature that led to irrational biases in behavior. All results are cluster-corrected at p < 0.05.
Figure 9.
Figure 9.
Stay and switch signals in the decision phase. A, In the decision phase, in addition to decision signals (shown in Fig. 8), dACC activity decreased when participants chose to stay rather than switch relative to the last trial (blue; i.e., when they chose the same option again as on the last trial). This was independent of whether the reward had been real or hypothetical on the last trial. In contrast, regions including the amygdala (green) differentially activate to whether a trial was a stay or a switch, depending on whether the reward was real or hypothetical on the last trial. All results are cluster-corrected at p < 0.05. B, Specifically, when there had been a real reward on the last trial, the amygdala was more active on stay than on switch trials (purple line). In contrast, when the reward had been hypothetical on the last trial, the amygdala was less active on stay than on switch trials (blue). During decisions, the amygdala's activity was negatively coupled with the aPFC's activity (Ci); a negative correlation between activity in the two regions. This negative coupling was decreased (Cii) on trials when participants made choices that were consistent with the overall bias introduced by the last trial's reward type. *p < 0.05 (two-tailed one-sample test of correlation values).

Similar articles

Cited by

References

    1. Ahn WY, Krawitz A, Kim W, Busmeyer JR, Brown JW. A model-based fMRI analysis with hierarchical Bayesian parameter estimation. J Neurosci Psychol Econ. 2011;4:95–110. doi: 10.1037/a0020684. - DOI - PMC - PubMed
    1. Beckmann CF, Smith SM. Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Trans Med Imaging. 2004;23:137–152. doi: 10.1109/TMI.2003.822821. - DOI - PubMed
    1. Beckmann CF, Jenkinson M, Smith SM. General multilevel linear modeling for group analysis in fMRI. Neuroimage. 2003;20:1052–1063. doi: 10.1016/S1053-8119(03)00435-X. - DOI - PubMed
    1. Beckmann M, Johansen-Berg H, Rushworth MF. Connectivity-based parcellation of human cingulate cortex and its relation to functional specialization. J Neurosci. 2009;29:1175–1190. doi: 10.1523/JNEUROSCI.3328-08.2009. - DOI - PMC - PubMed
    1. Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an uncertain world. Nat Neurosci. 2007;10:1214–1221. doi: 10.1038/nn1954. - DOI - PubMed

Publication types