Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan 11;32(2):551-62.
doi: 10.1523/JNEUROSCI.5498-10.2012.

Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain

Affiliations

Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain

Yael Niv et al. J Neurosci. .

Abstract

Humans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are sensitive to experienced risk. Our analysis focused on anatomically delineated regions of a priori interest in the nucleus accumbens, where blood oxygenation level-dependent (BOLD) signals have been suggested as correlating with quantities derived from reinforcement learning. We first provide unbiased evidence that the raw BOLD signal in these regions corresponds closely to a reward prediction error. We then derive from this signal the learned values of cues that predict rewards of equal mean but different variance and show that these values are indeed modulated by experienced risk. Moreover, a close neurometric-psychometric coupling exists between the fluctuations of the experience-based evaluations of risky options that we measured neurally and the fluctuations in behavioral risk aversion. This suggests that risk sensitivity is integral to human learning, illuminating economic models of choice, neuroscientific models of affective learning, and the workings of the underlying neural mechanisms.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A reinforcement-learning task designed to assess the dynamic effects of risk on choice behavior and learning processes. In each trial, one or two slot machines differing in color and abstract symbol were presented on the left or right side of the screen. Subjects had to choose one of the two machines or indicate the location of the single machine. This triggered the rolling animation of the slot machine, followed by text describing the amount of money won.
Figure 2.
Figure 2.
Risk sensitivity varied between subjects and across sessions within subjects. The percent choice of the sure 20¢ stimulus over the risky 0/40¢ stimulus in each of three experimental sessions is plotted for each subject (∼10 choices per subject per session).
Figure 3.
Figure 3.
Three qualitatively different models for risk-sensitive choice. a, TD model: biased sampling that tends to choose the better of two options implies that fluctuations in the prediction of risky options below their mean persist longer than fluctuations above their mean. Simulation of five choice trials, starting from equal predictions V(A) = V(B) = 20, with η = 0.5, and preferential choice of the stimulus with the higher predicted value are shown. The sure option, A, dominates. b, Utility model: concave (solid) or convex (dashed) nonlinear subjective utility functions for different monetary rewards lead to risk-averse or risk-seeking behavior, respectively. c, RSTD model: positive prediction errors δ > 0 are weighted by η+ during learning, while negative prediction errors are weighted by η. Whereas symmetric weighting (η+ = η) results in an average predictive value of 20 for the 0/40¢ stimulus (left), losses loom larger if η+ < η, leading to a value that is on average smaller than 20, and consequently to risk aversion (middle), and conversely when for η+ > η (right). d–f, Parameter fits for three different models that can potentially explain the subjects' risk-sensitive choices. Each subject's best-fit parameter values are plotted against the fraction of their choices of the 20¢ stimulus over the alternative 0/40¢ stimulus throughout the whole experiment. d, In the classic TD model, higher learning rates (η) account for more risk aversion (Niv et al., 2002). e, In the utility model, risk-neutral subjects have values of a near 2 (dashed horizontal line), implying a rather linear utility function for this range of monetary rewards; a is lower than 2 for risk-averse subjects (consonant with a concave utility function), and higher than 2 for risk seekers (consonant with a convex utility function). f, In the RSTD model, the normalized difference between η and η+ is strongly correlated with risk sensitivity (Mihatsch and Neuneier, 2002).
Figure 4.
Figure 4.
Direct comparison of the posterior probability per choice trial for the utility and the RSTD models shows that the RSTD model assigns a higher average probability per trial to the choices of 15 out of 16 subjects. The average probability assigned to a choice trial for each subject is the likelihood of the data divided by the number of choice trials. This provides a measure of the choice variance explained by the model (chance, 0.5). The one subject for which the utility model assigned a higher probability per choice is denoted by ×.
Figure 5.
Figure 5.
a, Overlap between anatomical ROIs depicted on the average anatomical image of the subjects. Darker red (right NAC) and blue (left NAC) denote a higher degree of overlap between the ROIs of different subjects. b, The raw BOLD signal (in arbitrary units) as extracted from the ROIs in a, aligned on trial onset, averaged over all subjects and all trials, and separated according to chosen stimulus and payoff. Shading corresponds to the SEM for each trace. Compare with e. c, Illustration of the theoretical asymptotic prediction errors for each stimulus. Stimuli (presented at time t = 0) appear unpredictably and so induce prediction errors approximately equal to their mean values (40 and 20 for the sure stimuli and 20 for risky stimulus, ignoring subject-specific risk-related perturbations for purposes of illustration only). For the sure stimuli, the outcomes (at t = 5) are fully predicted, and thus induce no further prediction errors. For the 0/40¢ stimulus, the 40¢ outcome induces a positive prediction error (red solid line) of ∼20; the 0¢ outcome induces a negative prediction error of approximately (−20) (red dashed line). The 0¢ stimulus (data not shown) is not expected to generate a prediction error at time of stimulus onset or payoff. d, The canonical hemodynamic response function. e, Model prediction errors at the time of the stimulus and outcome for every subject for each condition (using individual best-fit parameters for the RSTD model) were convolved with the hemodynamic response function (d) and averaged to predict the grand average BOLD signal. The hemodynamic lag adds 5 s to the times to peak (dashed black lines). The yellow trace corresponds to the 0¢ stimuli and is below baseline due to the residual dip from the hemodynamic response in the previous trial.
Figure 6.
Figure 6.
Regression coefficients for the obtained reward and the expected value at time of outcome in the two anatomically defined ROIs. Each data point represents one subject. In each ROI, the coefficients are strongly correlated (right NAC, dashed line, ρ = 0.72; p = 0.002; regression slope, 0.66; left NAC, solid line, ρ = 0.70; p = 0.003; regression slope, 0.88).
Figure 7.
Figure 7.
Risk sensitivity can be inferred from neural values extracted from prediction error signals. The neural values of the sure 20¢ stimulus and the risky 0/40¢ stimulus were extracted from the BOLD signal for each anatomical ROI and for each subject. a, Across subjects, the difference between the neural values of these two stimuli correlated with behavioral risk aversion, with similar correlations apparent when considering each of the ROIs separately. b, When considering each session separately (and averaging over both ROIs to reduce noise), the correlations between value differences and behavioral risk sensitivity were remarkably similar, despite the fact that behavioral risk aversion within subjects varied widely across sessions.

Similar articles

Cited by

References

    1. Abler B, Walter H, Erk S, Kammerer H, Spitzer M. Prediction error as a linear function of reward probability is coded in human nucleus accumbens. Neuroimage. 2006;31:790–795. - PubMed
    1. Barto A. G. Adaptive critic and the basal ganglia. In: Houk JC, Davis JL, Beiser DG, editors. Models of information processing in the basal ganglia. Cambridge, MA: MIT; 1995. pp. 215–232.
    1. Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. - PMC - PubMed
    1. Bernoulli D. Exposition of a new theory on the measurement of risk. Econometrica. 1954;22:23–36.
    1. Breiter HC, Aharon I, Kahneman D, Dale A, Shizgal P. Functional imaging of neural responses to expectancy and experience of monetary gains and losses. Neuron. 2001;30:619–639. - PubMed

Publication types

LinkOut - more resources