Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 8;15(4):e1006973.
doi: 10.1371/journal.pcbi.1006973. eCollection 2019 Apr.

Contextual influence on confidence judgments in human reinforcement learning

Affiliations

Contextual influence on confidence judgments in human reinforcement learning

Maël Lebreton et al. PLoS Comput Biol. .

Abstract

The ability to correctly estimate the probability of one's choices being correct is fundamental to optimally re-evaluate previous choices or to arbitrate between different decision strategies. Experimental evidence nonetheless suggests that this metacognitive process-confidence judgment- is susceptible to numerous biases. Here, we investigate the effect of outcome valence (gains or losses) on confidence while participants learned stimulus-outcome associations by trial-and-error. In two experiments, participants were more confident in their choices when learning to seek gains compared to avoiding losses, despite equal difficulty and performance between those two contexts. Computational modelling revealed that this bias is driven by the context-value, a dynamically updated estimate of the average expected-value of choice options, necessary to explain equal performance in the gain and loss domain. The biasing effect of context-value on confidence, revealed here for the first time in a reinforcement-learning context, is therefore domain-general, with likely important functional consequences. We show that one such consequence emerges in volatile environments, where the (in)flexibility of individuals' learning strategies differs when outcomes are framed as gains or losses. Despite apparent similar behavior- profound asymmetries might therefore exist between learning to avoid losses and learning to seek gains.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1
Experiment 1 Task Schematic, Learning and Confidence Results (A) Behavioral task. Successive screens displayed in one trial are shown from left to right with durations in ms. After a fixation cross, participants viewed a couple of abstract symbols displayed on both sides of a computer screen and had to choose between them. They were thereafter asked to report their confidence in their choice on a numerical scale (graded from 0 to 10). Finally, the outcome associated with the chosen symbol was revealed. (B) Task design and contingencies. (C) Performance. Trial by trial percentage of correct responses in the partial (left) and the complete (middle) information conditions. Filled colored areas represent mean ± sem; Right: Individual averaged performances in the different conditions. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem. (D) Confidence. Trial by trial confidence ratings in the partial (left) and the complete (middle) information conditions. Filled colored areas represent mean ± sem; Right: Individual averaged performances in the different conditions. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem.
Fig 2
Fig 2
Experiment 2 Task Schematic, Learning and Confidence Results (A) Behavioral task. Successive screens displayed in one trial are shown from left to right with durations in ms. After a fixation cross, participants viewed a couple of abstract symbols displayed on both sides of a computer screen, and had to choose between them. They were thereafter asked to report their confidence in their choice on a numerical scale (graded from 50 to 100%). Finally, the outcome associated with the chosen symbol was revealed. (B) Task design and contingencies. (C) Performance. Trial by trial percentage of correct responses in the partial (left) and the complete (middle) information conditions. Filled colored areas represent mean ± sem; Right: Individual averaged performances in the different conditions. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem. (D) Confidence. Trial by trial confidence ratings in the partial (left) and the complete (middle) information conditions. Filled colored areas represent mean ± sem; Right: Individual averaged performances in the different conditions. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem.
Fig 3
Fig 3
Incentive mechanism and overconfidence (A) Incentive mechanism. In Experiment 2, for the payout-relevant trials a lottery L is randomly drawn in the 50–100% interval and compared to the confidence rating C. If L > C, the lottery is implemented. A wheel of fortune, with a L% chance of losing is displayed, and played out. Then, feedback informed participants whether the lottery resulted in a win or a loss. If C > L, a clock is displayed together with the message “Please wait”, followed by feedback which depended on the correctness of the initial choice. With this mechanism, participant can maximize their earning by reporting their confidence accurately and truthfully. (B) Overconfidence. Individual averaged calibration, as a function of Experiment 2 experimental conditions (with a similar color code as in Figs 1 and 2). Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem.
Fig 4
Fig 4. Modelling results: Fits.
Behavioral results and model fits in Experiments 1(A) and 2 (B). Top: Learning performance (i.e. percent correct). Middle: Choice rate in the transfer test. Symbols are ranked by expected value (L75: symbol associated with 75% probability of losing 1€; L25: symbol associated with 25% probability of losing 1€; G25: symbol associated with 25% probability of winning 1€; G75: symbol associated with 75% probability of winning 1€;) Bottom: Confidence ratings. In all panels, colored dots and error bars represent the actual data (mean ± sem), and filled areas represent the model fits (mean ± sem). Model fits were obtained with the RELATIVE reinforcement-learning model for the learning performance (top) and the choice rate in the transfer test (middle), and with the FULL glme for the confidence ratings (bottom). Dark grey diamonds in the Preference panels (middle) indicate the fit from the ABSOLUTE model.
Fig 5
Fig 5. Modelling results: Lesioning approach.
Three nested models are compared in their ability to reproduce the pattern of interest observed in averaged confidence ratings, in experiment 1 (A) and experiment 2 (B). In the FULL model, confidence is modelled as a function of three factors: the absolute difference between options values, the confidence observed in the previous trial, and the context value. In the REDUCED model 1, confidence is modelled as a function of only two factors: the absolute difference between options values and the confidence observed in the previous trial. Hence, the REDUCED model 1 omits the context-value as a predictor of confidence. In the REDUCED model 2, confidence is modelled as a function of only two factors: the absolute difference between options values and the context-value. Hence, the REDUCED model 2 omits the confidence observed in the previous trial as a predictor of confidence. Left: pattern of confidence ratings observed in the behavioral data. Middle-left: pattern of confidence ratings estimated from the FULL model. Middle-right: pattern of confidence ratings estimated from the REDUCED model 1. Right: pattern of confidence ratings estimated from the REDUCED model 2. In red are reported statistics from a repeated-measure ANOVA where the alternative model fails to reproduce important statistical properties of confidence observed in the data. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem.
Fig 6
Fig 6. Summary of the modelling results.
The schematic illustrates the computational architecture that best accounts for the choice and confidence data. In each context (or state) ‘s’, the agent tracks option values (Q(s,:)), which are used to decide amongst alternative courses of action, together with the value of the context (V(s)), which quantify the average expected value of the decision context. In all contexts, the agent receives an outcome associated with the chosen option (Rc), which is used to update the chosen option value (Q(s,c)) via a prediction error (δc) weighted by a learning rate (αc). In the complete feedback condition, the agent also receives information about the outcome of the unselected option (Ru), which is used to update the unselected option value (Q(s,u)) via a prediction error (δu) weighted by a learning rate (αu). The available feedback information (Rc and Ru, in the complete feedback contexts and Q(s,u) in the partial feedback contexts) is also used to update the value of the context (V(s)), via a prediction error (δV) weighted by a specific learning rate (αV). Option and context values jointly contribute to the generation of confidence judgments.
Fig 7
Fig 7
Experiment 3 task schematic, reversal learning and confidence results. (A) Task design and contingencies. (B) Performance. Trial by trial percentage of correct responses in the partial (left) and the complete (middle-left) information conditions. Filled colored areas represent mean ± sem; Middle-right and right: Individual averaged performances in the different conditions, before (middle-right) and after (right) the reversal. The orange shaded area highlights the post-reversal behavior. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem. (C) Confidence. Trial by trial confidence ratings in the partial (left) and the complete (middle-left) information conditions. Filled colored areas represent mean ± sem; Middle-right and right: Individual averaged performances in the different conditions, before (middle-right) and after (right) the reversal. The orange shaded area highlights the post-reversal behavior. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem. GSta: Gain Stable; LSta: Loss Stable; GRev: gain reversal; LRev: Loss Reversal.

Similar articles

Cited by

References

    1. Sutton RS, Barto AG. Reinforcement learning: An introduction. MIT press Cambridge; 1998.
    1. Erev I, Roth AE. Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria. Am Econ Rev. 1998;88: 848–881. 10.2307/117009 - DOI
    1. Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. Class Cond II Curr Res Theory. 1972;2: 64–99.
    1. Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature. 2006;441: 876–879. 10.1038/nature04766 - DOI - PMC - PubMed
    1. O’Doherty J, Dayan P, Schultz J, Deichmann R, Friston K, Dolan RJ. Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning. Science. 2004;304: 452–454. 10.1126/science.1094285 - DOI - PubMed

Publication types

Grants and funding

This work was supported by startup funds from the Amsterdam School of Economics, awarded to JBE. JBE and ML gratefully acknowledge support from Amsterdam Brain and Cognition (ABC). ML is supported by an NWO Veni Fellowship (Grant 451-15-015), a Swiss National Fund Ambizione grant (PZ00P3_174127) and the Fondation Bettencourt Schueller. SP is supported by an ATIP-Avenir grant (R16069JS), the Programme Emergence(s) de la Ville de Paris, the Fondation Fyssen and Fondation Schlumberger pour l’Education et la Recherche. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.