Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 17;106(6):1044-1054.e4.
doi: 10.1016/j.neuron.2020.03.024. Epub 2020 Apr 20.

Prefrontal Cortex Predicts State Switches during Reversal Learning

Affiliations

Prefrontal Cortex Predicts State Switches during Reversal Learning

Ramon Bartolo et al. Neuron. .

Abstract

Reinforcement learning allows organisms to predict future outcomes and to update their beliefs about value in the world. The dorsal-lateral prefrontal cortex (dlPFC) integrates information carried by reward circuits, which can be used to infer the current state of the world under uncertainty. Here, we explored the dlPFC computations related to updating current beliefs during stochastic reversal learning. We recorded the activity of populations up to 1,000 neurons, simultaneously, in two male macaques while they executed a two-armed bandit reversal learning task. Behavioral analyses using a Bayesian framework showed that animals inferred reversals and switched their choice preference rapidly, rather than slowly updating choice values, consistent with state inference. Furthermore, dlPFC neural populations accurately encoded choice preference switches. These results suggest that prefrontal neurons dynamically encode decisions associated with Bayesian subjective values, highlighting the role of the PFC in representing a belief about the current state of the world.

Keywords: Bayesian update; large-scale recordings; macaques; model-based; neural ensemble; prefrontal cortex; reversal learning; state inference.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests The authors declare no competing interests.

Figures

Figure 1.
Figure 1.
Task and Recording Sites. A. Schematic of the reversal learning task. On each trial, the animals were required to fixate centrally, and after a variable fixation time the fixation spot was toggled off and two targets were simultaneously presented to the left and right. Then, animals made a saccade to select a target, holding it for 500ms to successfully complete a trial. Reward was delivered stochastically with one option having a higher reward probability (p=0.7 vs p=0.3). On each block of 80 trials, reward probability mappings were defined in two ways defining two block types: in What blocks reward probabilities were associated to the images independent of where they were presented, whereas in Where blocks probabilities were associated to locations (left/right) independent of the image presented at that location. Animals explored the available options to find both the block type and the best option, acquiring a choice preference. Then, at a random trial within a switch window (trials 30-50) reward mappings were flipped across options according to block type, dividing the block into acquisition and reversal phases. Block type was held constant within a given block. B. Location of the 8 microelectrode arrays (96 electrodes, 10×10 arrangement) on the prefrontal cortex, surrounding the principal sulcus. C. Bayesian estimates of the posterior probability of a reversal in the choice-outcome mapping (Ideal Observer (IO) model, P(reversal∣M=IO)) and in the choice preference (Behavioral (BHV) model, P(reversal∣M=BHV)). These curves were generated by averaging trial-by-trial the posteriors across blocks. D. Bayesian estimate of the posterior probability of a reversal in choice preference aligned to the point estimate of the trial at which the reversal occurred. These curves were generated by calculating the expected value of P(reversal∣M=BHV) in each block, and then aligning P(reversal∣M=BHV) around that estimate before averaging across blocks. E. Boxplots of the difference between the point estimates for the reversal based on the posterior P(reversal∣M) distributions for the BHV and the IO models. Positive values indicate that the reversal in choice preference occurred after the reward mapping switched. F. Choice and Rescorla-Wagner model data aligned to the IO reversal estimate. Because the reversal trial varied across blocks, the choice and model data from each block were split into acquisition (i.e. trials < the IO reversal trial) and reversal (i.e. trials >= the IO reversal trial) phases. The data were then interpolated such that the acquisition and reversal phases both had 40 trials. Interpolated data was then averaged. Plots show the fraction of times the animals chose the option that initially had a higher reward probability, split by block type. Overlays are choice probability estimates from the Rescorla-Wagner model fit. G. Same as F, except acquisition and reversal phases were defined by the BHV reversal point. H. Same as G, except that overlays are Pearce-Hall model choice probabilities. F-H show means±SEM across sessions (n=8).
Figure 2.
Figure 2.
Neural responses. A. Raster plots of an example neuron during What and Where blocks. Each row of blue ticks represents the spikes during a trial. Red dots along each line represent trial start, cue onset, outcome time/end of trial. Because the image varies in each block, trials were sorted by preferred (Image B) and non-preferred (Image A) images in each block. B. Spike densities for the example unit during each option and block type combination. C. Activity associations to behavior found in the population of recorded single units. The plot shows the average fraction of neurons across sessions (mean±SEM) with significant main effects for the indicated factors from an ANOVA on spike counts from a sliding window (300ms width, 20ms step). The total number of neurons recorded is 6081.
Figure 3.
Figure 3.
Decoding of Reversal from activity between 0-300ms after cue onset. A. Sum of Squared Residuals (SSResid) across neurons. The residual for each neuron in each trial was squared, then the squares were summed across neurons. The average Sum of Squares (red) on each trial within the switch window is shown overlaid on the Bayesian posterior P(reversal∣M=BHV) (blue). Inset shows the correlation between the two curves, the red line is the best linear fit. B. Reward Prediction Error (RPE) from a Rescorla-Wagner model. SSResid (red) is overlaid on the RPE around the trial of behavior reversal (green). Inset, same as in A. C. Neural posterior distribution, P(reversal∣Neural Response), from a Linear Discriminant Analysis (black) overlaid on P(reversal∣M=BHV) (blue). Note that the decoding algorithm generates a posterior over reversal trials for each block. This plot shows the average of those posteriors. D. Histogram of decoded trial of reversal. Within a window around the actual reversal in each block, we searched for the trial with the maximum posterior from the neural decoding model: trial = argmax(trial) P(reversal=trial ∣ Neural Response) and used this trial as the predicted reversal. We labeled decoded reversals as decoding error, i.e. the number of trials from the Bayesian point estimate for the behavioral reversal. The red dashed line shows chance level. Note that the histogram of decoded trials (D) usually matches the average posterior (C) but not always. The 5th and 95th percentiles of the decoding errors were −9 and 7 trials relative to reversal, respectively. The distribution of decoding errors was not significantly different between What and Where blocks (KS test, D94,96=0.072, p=0.96), hence they were pooled together. Plots show mean±SEM across sessions (n=8).
Figure 4.
Figure 4.
Decoding error and noise in behavior. A. Bayesian P(reversal∣M=BHV) distribution averaged across blocks as a function of absolute neural decoding error. Color code indicates probability. The triangle markers to the right of the plot mark decoding error values 0 (blue), 5 (green) and 10 (red). B. P(reversal∣M=BHV) distributions (mean±SEM, n=8 sessions) around the behavioral reversal point for three different decoding error values. C. Entropy of the P(reversal∣M=BHV) distributions as a function of decoding error. Black dots are entropy values for individual blocks and blue circles are the mean across blocks with the same absolute decoding error value. Mean regression line across sessions in red, shading is the SEM of the regression line.
Figure 5.
Figure 5.
Decoding of Reversal across trial execution. A. Decoding error distributions for a sliding time window (300ms width, 50ms step) during trial time using data aligned to cue onset. Color code is fraction of blocks. B. Peak decoding error during trial execution. The gray shaded area depicts the time window at which the trials ended and the outcome (reward/no-reward) was known to the animals. C. Decoding error distributions for data aligned to cue onset. Color code is fraction of blocks. Spikes were aligned to the time of the trial outcome/end of trial. D. Peak decoding error during trial execution around outcome time. The dashed line marks decoding error = −1. E. Mean posterior probability P(trial=reversal∣Neural Response) distribution for a 300ms window starting at outcome time. Red dashed line shows trial −1 from behavioral reversal. Values in B, D and E are means±SEM across sessions (n=8).
Figure 6.
Figure 6.
Sum of Squared Residuals (SSresid) during the first 20 trials in the block. A. SSresid for a window from 0-300ms after cue onset. B. SSresid for a window from 0-300ms after trial outcome. Means±SEM across sessions (n=8).
Figure 7.
Figure 7.
Neural state-space trajectories. A. Neural trajectory across trials for an example recording session. The curve represents the average trajectory for all blocks in the session. Color code is trial number within block. Orange shading illustrates the final state-space region where the neural activity lies, centered on the average of the last 20 trials in the block. B. Euclidean distance between the location for each trial in the PCA space and the centroid of the final state-space (means±SEM across sessions). C. Trial neural trajectories over the 2nd principal component for an example session. Each trace corresponds to a trial around the reversal (trials −10 to 9 from reversal, color coded), averaged over blocks. Dashed lines divide the different trial periods (see Fig 1A). Arrows and numbers point at the period of the trial on which the trajectory of the indicated trial (−2, −1 and 0) deviates the most from the average trajectory of all other trials. D. Distance from the average trajectory around the reversal (trials - 2 to 0 from reversal) to the average trajectory during the initial acquisition (first 5 trials in the block, blue) and to the average trajectory at the end of the block (last 10 trials in the block, red). Distances were normalized by the maximum observed value, thus ranging between 0 and 1. E-H. Distances between trajectories of each individual trial and the average of all other trials in different trial periods. Data are means±SEM across sessions (n=8).
Figure 8.
Figure 8.
Effect of population size on decoding of reversal A. Distribution of the classification P(reversal∣Neural Response) over trials around the estimated reversal for different population sizes (grayscale coded). B. Histogram of decoded trial of reversal. The dashed line shows chance level. Data are means±SEM across sessions.

Similar articles

Cited by

References

    1. Abe H, Seo H, and Lee D (2011). The prefrontal cortex and hybrid learning during iterative competitive games. Annals of the New York Academy of Sciences 1239, 100–108. - PMC - PubMed
    1. Asaad WF, Rainer G, and Miller EK (2000). Task-Specific Neural Activity in the Primate Prefrontal Cortex. Journal of Neurophysiology 84, 451–459. - PubMed
    1. Averbeck BB (2017). Amygdala and Ventral Striatum Population Codes Implement Multiple Learning Rates for Reinforcement Learning. IEEE Symposium Series on Computational Intelligence.
    1. Averbeck BB, and Costa VD (2017). Motivational neural circuits underlying reinforcement learning. Nature Neuroscience 20, 505–512. - PubMed
    1. Averbeck BB, and Lee D (2007). Prefrontal neural correlates of memory for sequences. J Neurosci 27, 2204–2211. - PMC - PubMed

Publication types

LinkOut - more resources