Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep 22;71(6):1141-52.
doi: 10.1016/j.neuron.2011.07.025. Epub 2011 Sep 21.

Hedging your bets by learning reward correlations in the human brain

Affiliations

Hedging your bets by learning reward correlations in the human brain

Klaus Wunderlich et al. Neuron. .

Abstract

Human subjects are proficient at tracking the mean and variance of rewards and updating these via prediction errors. Here, we addressed whether humans can also learn about higher-order relationships between distinct environmental outcomes, a defining ecological feature of contexts where multiple sources of rewards are available. By manipulating the degree to which distinct outcomes are correlated, we show that subjects implemented an explicit model-based strategy to learn the associated outcome correlations and were adept in using that information to dynamically adjust their choices in a task that required a minimization of outcome variance. Importantly, the experimentally generated outcome correlations were explicitly represented neuronally in right midinsula with a learning prediction error signal expressed in rostral anterior cingulate cortex. Thus, our data show that the human brain represents higher-order correlation structures between rewards, a core adaptive ability whose immediate benefit is optimized sampling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Experimental Design (A) Subjects were presented with a slider to set portfolio weights that determine the fraction of each resource (wind or solar power) in the energy mix (screen 1). The weights could be set within the range from −1 to 2, with a fixed relationship that both weights always add up to 1, i.e., wwind = 1 − wsun. The trial outcome (screen 2) displayed the individual resource values for sun and wind, and the portfolio value of the combined mix (calculated by the weights from screen 1). (B) Optimal portfolio weight wsun (wwind = 1 – wsun) increases as a function of the correlation coefficient between sun and wind outcomes. The background color indicates portfolio standard deviation (blue = small SD, red = large SD). Optimal portfolio weights (for variance minimization) are displayed as white line, the gray lines indicate the 10% interval around the optimal choice (a deviation of that amount from the optimal weights would result in a 10% higher SD). (C) The correlation estimate ρ (red line) is updated from trial to trial (x axis) via a correlation prediction error ζ (green stems) and then in a second step used to allocate weights in every trial. Zeta is calculated as the cross-product between the two resource outcome prediction errors (gray bars). The correlation coefficient that was used to generate the data in this illustration is −0.60 during the first ten trials and afterward changes to +0.80 (dashed line). Learning of ρ from ζ is depicted here for a learning rate of 0.2.
Figure 2
Figure 2
Model Fit and Behavior (A) The correlation learning model explained subjects' behavior best. Plotted are the Bayesian information criterions, which are corrected for the different levels of complexity in the models (smaller values are better). The r2 value represents the proportion of behavioral variance explained by each model. (B) Regression of actual weights on model predicted weights. Data is pooled over all subjects; for single subject results see Table 1. Note that the deviations at the extremes are a result from bounding the possible weight range at −1 and 2; any behavioral errors at the boundary could therefore happen only in one direction. Error bars = SEM. (C) Both the response of a representative subject (blue) and the model predicted weights (red) approach the normative best response under full knowledge of the generative correlation (black line) with some lag, which results from the time necessary to observe changes in correlation. Subjects responded after a 20-trial long observation-only phase (not shown).
Figure 3
Figure 3
Neural Representation of Correlation Strength (A) Neural activity in midinsular cortex correlated with the trial-by-trial model predicted correlation strength between the two resource values at the presentation of the outcome screen. (B) Effect size plots (average percent signal change across subjects). Data plotted separately for trials in which the model predicted correlation strength was low and high in four bins (25/50/75/100 percentile of correlation range, errors bars = SEM). Activity in insula increased linearly with the correlation coefficient (that is, in contrast to the covariance, normalized by the standard deviations of the resources). Data were extracted using a cross-validation (leave-one-out) procedure to ensure independence of data used for localization and effect measure. (C) Time course plot of effect size for the correlation coefficient regressor. The correlation coefficient is represented at the time of the outcome screen, when new evidence becomes available, but not during the choice period. Thin lines = SEM. (D) Comparison of explained variance in the behavioral model with the explained variance in the fMRI analysis. Fluctuations in BOLD activity in midinsula can be particularly well explained within those subjects whose behavior is also well explained by the model (r = 0.50, p = 0.03). Each dot represents one subject and the line is the regression slope.
Figure 4
Figure 4
Neural Representation of Correlation Prediction Errors (A) Activity in rostral cingulate cortex correlated with the correlation prediction error. (B) Effect size plots (similar to Figure 3B) for the cluster confirm a linear effect.
Figure 5
Figure 5
Absolute Weight Updates Activity in ACC/DMPFC and anterior insula correlated, at the time of the outcome screen, with the absolute amount that subjects update the resource allocation weights during the following choice.

Similar articles

Cited by

References

    1. Andersson J.L., Hutton C., Ashburner J., Turner R., Friston K. Modeling geometric deformations in EPI time series. Neuroimage. 2001;13:903–919. - PubMed
    1. Andrade A., Paradis A.L., Rouquette S., Poline J.B. Ambiguous results in functional neuroimaging data analysis due to covariate correlation. Neuroimage. 1999;10:483–486. - PubMed
    1. Bechara A., Damasio H., Tranel D., Damasio A.R. Deciding advantageously before knowing the advantageous strategy. Science. 1997;275:1293–1295. - PubMed
    1. Becker G.M., DeGroot M.H., Marschak J. Measuring utility by a single-response sequential method. Behav. Sci. 1964;9:226–232. - PubMed
    1. Behrens T.E., Woolrich M.W., Walton M.E., Rushworth M.F. Learning the value of information in an uncertain world. Nat. Neurosci. 2007;10:1214–1221. - PubMed

Publication types

LinkOut - more resources