Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Clinical Trial
. 2017 May 9;114(19):E3859-E3868.
doi: 10.1073/pnas.1615773114. Epub 2017 Apr 24.

Brain networks for confidence weighting and hierarchical inference during probabilistic learning

Affiliations
Clinical Trial

Brain networks for confidence weighting and hierarchical inference during probabilistic learning

Florent Meyniel et al. Proc Natl Acad Sci U S A. .

Abstract

Learning is difficult when the world fluctuates randomly and ceaselessly. Classical learning algorithms, such as the delta rule with constant learning rate, are not optimal. Mathematically, the optimal learning rule requires weighting prior knowledge and incoming evidence according to their respective reliabilities. This "confidence weighting" implies the maintenance of an accurate estimate of the reliability of what has been learned. Here, using fMRI and an ideal-observer analysis, we demonstrate that the brain's learning algorithm relies on confidence weighting. While in the fMRI scanner, human adults attempted to learn the transition probabilities underlying an auditory or visual sequence, and reported their confidence in those estimates. They knew that these transition probabilities could change simultaneously at unpredicted moments, and therefore that the learning problem was inherently hierarchical. Subjective confidence reports tightly followed the predictions derived from the ideal observer. In particular, subjects managed to attach distinct levels of confidence to each learned transition probability, as required by Bayes-optimal inference. Distinct brain areas tracked the likelihood of new observations given current predictions, and the confidence in those predictions. Both signals were combined in the right inferior frontal gyrus, where they operated in agreement with the confidence-weighting model. This brain region also presented signatures of a hierarchical process that disentangles distinct sources of uncertainty. Together, our results provide evidence that the sense of confidence is an essential ingredient of probabilistic learning in the human brain, and that the right inferior frontal gyrus hosts a confidence-based statistical learning algorithm for auditory and visual sequences.

Keywords: Bayesian; confidence; functional MRI; learning; probabilistic inference.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
A normative role for confidence during probabilistic inference. (A) Subjects were asked to continuously estimate the probabilities p(A|B) and p(A|A) governing the successions between two stimuli, while these probabilities underwent occasional stepwise changes. The probability of change point was held constant, but transition probabilities and change times varied randomly between sessions and participants. Subjective confidence levels were probed occasionally at trials marked by red marks. (B) Example sequence. Actual A and B stimuli corresponded to clearly distinguishable pairs of sounds or visual symbols, presented in distinct auditory and visual sessions. (C) The likelihood distribution of transition probabilities were estimated by a Bayesian observer, based on the observations received so far and the generative model. Likelihood distributions are presented as slices as a function of time. Each slice corresponds to a marginal distribution from the actual 2D space defined by combinations of p(A|B) and p(A|A). Confidence in the estimated transition probabilities corresponds to the (negative log) SD of those marginal distributions. (D) Bayesian theory predicts that confidence should serve as a weighting factor that modulates the incoming evidence during optimal inference: the update induced by a new observation should be less pronounced when confidence in the prior estimation is high. (E) Values of three key variables of Bayesian inference were sorted into bins and averaged across all observations presented to subjects. Notations: y is the sequence of observations received, θ the estimated transition probability, σ its SD, and KL (Kullback–Leibler divergence) quantifies the distance between two probability distributions, high/low C stands for high/low confidence. Blue/orange corresponds to expected (likelihood >0.5)/unexpected (likelihood <0.5) outcomes. Note that confidence in the current transition does not depend on the eventual outcome because it is not yet observed (the blue/orange curves overlap perfectly).
Fig. S1.
Fig. S1.
Description of the transition-probability estimation task. (A) Subjects were presented with series of auditory or visual stimuli (denoted A and B). Each session presented a series of 380 stimuli (only an example portion is show here). The red dots depict the position of the occasional questions that interrupted the sequence to ask subjects for their estimate of transition probability (e.g., B → A) and their confidence in this estimate. The actual stimuli are illustrated by gray screenshots: in different sessions, stimuli were either visual (a line of dots tilted clockwise or anticlockwise) or auditory (vowels “A” or “O” played through a loudspeaker). A fixation dot was interleaved between visual stimuli and remained present on screen during auditory sessions. (B) During training, when questions appeared on screen, the previous stimulus (e.g., B) was displayed and subjects indicated on a slider their estimate of the probability for the next stimulus to be A or B. In the actual display, A and B were replaced by the corresponding visual symbols or vowels. Once subjects had validated their probability estimate, they were asked to rate on a slider how confident they were in that estimate. Subjects also had to report online when they detected changes in the transition probabilities: they could stop the sequence at any time by pressing a key to indicate how long ago the change had supposedly occurred. During fMRI recording, the report was simplified: subjects reported confidence about their probability estimates (that remained covert) on a four-step scale. Reports were self-paced, and the stimulus sequence was resumed without feedback.
Fig. S2.
Fig. S2.
Fast convergence of volatility. (A) The process generating the observed sequence can be cast as a hidden Markov model. At any given trial n, the stimulus yn remains the same (A or B) as yn-1 with a probability determined by θn. Note that θn has two dimensions, one corresponding to p(A|A) and the other to p(A|B). The transition probabilities θn themselves usually maintain the same values as θn-1 with a probability 1 − ν, but occasionally, with a small probably ν (called volatility), a change occurs and the θn are resampled randomly. The ideal observer inverts this generative process using optimal hierarchical inference. To do so, it maintains running estimates of the distributions of possible values for θn and for ν. In such a task, the sense of confidence is therefore hierarchically organized: there is a distinct sense of confidence in having learned the specific parameters θn, as well as in the likelihood that they changed. (B) The ideal-observer model returns, in particular, the posterior distributions on volatility ν at any given trial given the observations received so far. The heat map shows posterior distributions averaged across sequences presented to participants; blue dots show the most likely values. At the end of the training session, the maximum a posteriori value was 0.022 ± 0.003; at the end of the first MRI session it was 0.019 ± 0.002 (paired difference 0.0025 ± 0.0021, t20 = 1.19, P = 0.25). Note that in practice, subjects probably did not start with a uniform prior about volatility because they were informed that changes were rare. Therefore, actual convergence of the estimated volatility may have been even faster than illustrated here.
Fig. 2.
Fig. 2.
Brain regions whose signals correlated with the internal updates predicted by the ideal-observer model. The statistical maps show the group-level significance of positive (red) and negative (blue) regression coefficients, when fMRI activity was regressed at every trial and all sessions on the optimal (Bayesian) amount of estimation update (GLM1). Inset shows the results of the analysis restricted to audio and visual sessions (see also Fig. S3). Maps are thresholded at the voxel level (P < 0.001 uncorrected) and the cluster level (P < 0.05 FWE).
Fig. S3.
Fig. S3.
Modality-specific areas and surprise signals. (A) Maps showing regions more activated by visual stimuli than auditory stimuli in green and the opposite contrast in yellow. (B) Optimal surprise levels were regressed on fMRI signals (GLM3). Average regression coefficients in the auditory and visual cortices (using the clusters shown in B) are plotted separately for auditory and visual sessions. Maps are thresholded at the voxel level (P < 0.001 uncorrected) and the cluster level (P < 0.05 FWE), error bars are SEM across subjects; *: P < 0.05 (t test).
Fig. S4.
Fig. S4.
Results of the multiple linear regression. (A–C) The fMRI signal was regressed on the optimal confidence and surprise levels in the same model (GLM11). The panels show significance maps for the confidence (A) and surprise (B) regressors, thresholded at the voxel level (P < 0.001 uncorrected) and the cluster level (P < 0.05 FWE), and their conjunction (C) at a more liberal threshold: voxel-level P < 0.001 uncorrected, cluster-level P < 0.05 uncorrected. These results replicate most aspects of the ANOVA approach presented in the main text and Figs. 3–5. However, a difference between both approaches is that the multiple linear regression yields much stronger and larger correlates for confidence than for surprise. (D) The multiple regression results may be biased here because regressors are correlated (the mean Pearson ρ across participant is −0.44), and the skewed distribution of surprise values saturates in subjects by comparison with the optimal algorithm. The distribution is skewed because unexpected outcomes are less frequent than expected ones when learning captures the probabilities of observations. However, the subjects’ estimates of those probabilities are slightly less extreme than the optimal ones (see figure 4A in ref. 13). This difference should impact mostly the surprise levels corresponding to unexpected outcomes and thereby bias the regression analysis. This difference should also impact confidence levels; however, their distribution being remarkably symmetrical, it should not bias the regression analysis. Because confidence and surprise levels are negatively correlated, the multiple regression will favor the confidence regressor at the expense of the surprise regressor in the presence of saturation. We verified this with simulations. We introduced a sigmoid:f(x)=(1+es(xx¯))1, with x¯ as the mean of x, to saturate the optimal surprise levels or the optimal confidence levels in distinct simulations. We then convolved these time series with the hemodynamic response function, and we regressed these simulated data on a design matrix that contained the nonsaturated, HRF-convolved optimal levels of confidence and surprise. Mean weights ±SEM across participants (using the actual observations they received) are shown for s = 4, but the qualitative result does not depend on this choice: a saturation indeed produces a bias in favor of the confidence regressor. Note that our simulation does not include noise; therefore, the regression problem may be aggravated in real data.
Fig. 3.
Fig. 3.
Cortical correlates of confidence. (A) Main effect of confidence in an ANOVA also controlling for surprise and predictability levels (GLM 4). Maps are thresholded at the voxel level (P < 0.001 uncorrected) and the cluster level (P < 0.05 FWE). (B) The fMRI signals (plain lines) are plotted following the categorical approach presented in Fig. 1E. Signals were extracted with cross-validation: voxels were identified from ANOVA in one session type (auditory or visual) and tested in the other (Materials and Methods). The plot shows the average of both cross-validated extractions. To facilitate visual comparison, optimal confidence levels (dashed lines) are overlaid after adjusting for offset and scaling for each cluster (and not each subplot). Blue vs. orange correspond to expected vs. unexpected outcomes, following the ideal observer’s estimates. (C) Cross-validated variations in fMRI signals as a function of the theoretical confidence predicted by the ideal-observer model (binned into 6 percentiles) (GLM5). Squares correspond to the signal extracted in auditory sessions, triangles to visual sessions. Fitted lines correspond to the average of fitted individual data. Error bars are SEM across subjects (B and C). (D) Interindividual variations in neural signals predict interindividual variations in behavior: A significant between-subjects correlation was observed between the neural fit (regression coefficients between fMRI signals and ideal-observer confidence levels—GLM2) and behavioral fit (regression coefficients between subjective confidence reports and ideal-observer confidence levels).
Fig. 4.
Fig. 4.
Cortical correlates of surprise. (A) Main effect of surprise, in an ANOVA also controlling for confidence and predictability levels (GLM4). Maps are thresholded at the voxel level (P < 0.001 uncorrected) and the cluster level (P < 0.05 FWE). In B, dashed lines represent the optimal surprise signal and in C, bins correspond to 6 percentiles of the optimal surprise levels (GLM6). Same format as Fig. 3 (including cross-validation for B and C).
Fig. 5.
Fig. 5.
Cortical correlates of confidence-weighted update. (A) Conjunction of the main effects of surprise and confidence in an ANOVA also controlling for predictability levels (GLM4), shown at P < 0.005 uncorrected. In B dashed lines represent the optimal update signal and in C, bins correspond to 6 percentiles of optimal update levels (GLM7), same format as Fig. 3 (including cross-validation). (D) Conjunction of functional connectivity with the intraparietal sulcus (signaling confidence; Fig. 3A) and the frontal eye field (signaling surprise; Fig. 4A) [results are shown at a voxel level P < 0.05 FWE (GLM8)]. The two seed clusters are shown in blue.
Fig. 6.
Fig. 6.
Evidence for a hierarchical representation of confidence. (A) In the ideal-observer model, two transition probabilities are simultaneously monitored, each with its own confidence level (Fig. 1C). However, changes in the generative process are global, impacting both transition probabilities at once. Therefore, when a change is suspected, confidence in all transition probabilities should drop, even if this suspicion arises from the observation of a single succession type. To provide evidence of this effect with the actual sequences presented to subjects, optimal confidence levels were averaged during streaks with three or more repetitions (ABBBA, BAAAB, ABBBBA, etc.; note that A and B play symmetric roles). The plot shows confidence in the transition probability that is relevant during the streak (black) and confidence in the other transition probability (colors), separately for streaks within which confidence increases (no change is suspected, green) and within the others (in which a change is suspected, purple). When confidence in the observed succession type increases, the confidence for the unobserved succession type remains stable, but when confidence in the observed succession type decreases, the other one also drops: there is an interaction between streak type and the post- versus pre-streak optimal confidence levels. (B) The predicted interaction was observed in the inferior–middle frontal gyrus (I/MFG). The plot shows the difference in fMRI responses on trials that preceded and followed a streak of repeated items (filled, colored circles in A). Note that the activity in this region relates negatively to confidence, we therefore reverted the y axis to facilitate the visual comparison with the ideal observer. Error bars correspond to SEM across subjects; they were extremely small in A and thus omitted. *P < 0.05.

Similar articles

Cited by

References

    1. Behrens TEJ, Woolrich MW, Walton ME, Rushworth MFS. Learning the value of information in an uncertain world. Nat Neurosci. 2007;10:1214–1221. - PubMed
    1. Sutton R. Proceedings of the 7th Yale Workshop on Adaptive and Learning Systems. Yale University; New Haven, CT: 1992. Gain adaptation beats least squares? pp. 161–166.
    1. Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann; San Francisco: 1997.
    1. Dayan P, Kakade S, Montague PR. Learning and selective attention. Nat Neurosci. 2000;3:1218–1223. - PubMed
    1. Doya K. Metalearning and neuromodulation. Neural Netw. 2002;15:495–506. - PubMed

Publication types

LinkOut - more resources