Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct;23(10):1267-1276.
doi: 10.1038/s41593-020-0688-5. Epub 2020 Aug 10.

A quantitative reward prediction error signal in the ventral pallidum

Affiliations

A quantitative reward prediction error signal in the ventral pallidum

David J Ottenheimer et al. Nat Neurosci. 2020 Oct.

Abstract

The nervous system is hypothesized to compute reward prediction errors (RPEs) to promote adaptive behavior. Correlates of RPEs have been observed in the midbrain dopamine system, but the extent to which RPE signals exist in other reward-processing regions is less well understood. In the present study, we quantified outcome history-based RPE signals in the ventral pallidum (VP), a basal ganglia region functionally linked to reward-seeking behavior. We trained rats to respond to reward-predicting cues, and we fit computational models to predict the firing rates of individual neurons at the time of reward delivery. We found that a subset of VP neurons encoded RPEs and did so more robustly than the nucleus accumbens, an input to the VP. VP RPEs predicted changes in task engagement, and optogenetic manipulation of the VP during reward delivery bidirectionally altered rats' subsequent reward-seeking behavior. Our data suggest a pivotal role for the VP in computing teaching signals that influence adaptive reward seeking.

PubMed Disclaimer

Figures

Extended Data Figure 1.
Extended Data Figure 1.. Placements for random sucrose/maltodextrin, random sucrose/maltodextrin/water, and blocked sucrose/maltodextrin rats.
Recording locations for nucleus accumbens (left) and ventral pallidum (right) rats.
Extended Data Figure 2.
Extended Data Figure 2.. Evaluation of model fitting.
(a) Distribution of the learning rate, α, for RPE neurons in VP (green) and NAc (orange). (b) Likelihood (LH) per trial for RPE (n=72) and Current outcome (n=126) neurons for RPE and Current outcome models, relative to the LH per trial of the Unmodulated model. Lower (more negative) indicates a better fit. Line represents median, box represents 25th and 75th percentile, and whiskers extend to 1.5 times the interquartile range. Red highlights the AIC-selected model. Median [25th to 75th percentile; min to max] ΔLH/trial are: RPE neurons, RPE model −0.21 [−0.39 to −0.14; −3.16 to −0.05], RPE neurons, Current outcome model −0.15 [−0.32 to −0.09; −3.03 to −0.02], Current outcome neurons, RPE model −0.12 [−0.23 to −0.07; −0.174 to −0.03], Current outcome neurons, Current outcome model −0.12 [−0.22 to −0.07; −1.73 to −0.03]. Median [25th-75th percentile] LH per trial for RPE neurons was 2.29 [2.04 to 2.49] and for Current outcome neurons was 2.15 [1.92 to 2.37]. (c) Model recovery, plotted as the fraction of neurons simulated with each model recovered as that model. (d) Distribution of difference between the true value of the parameters used to simulate the neurons in (c) and the values recovered by MLE.
Extended Data Figure 3.
Extended Data Figure 3.. Placements for optogenetic experiments.
(a) Expression of ArchT3.0:YFP and fiber tip placement for the rats included in the ArchT3.0 group for the optogenetic experiment in Figure 3. (b) Expression of ChR2:GFP and fiber tip placement for the rats included in the ChR2 group. Pattern of results remained unchanged with or without inclusion of the rat with the most caudal placement.
Extended Data Figure 4.
Extended Data Figure 4.. Supplemental optogenetic data.
(a) Mean(+/−SEM) port occupancy in time surrounding reward delivery on laser and no laser trials for YFP (left, n=7 rats) and ArchT (right, n=7 rats) groups. (b) Mean(+/−SEM) port occupancy in time surrounding reward delivery on laser and no laser trials for GFP (left, n=7 rats) and ChR2 (right, n=11 rats) groups. To account for the disruption of port occupancy by laser stimulation, we ran our distance from port analysis on the time beyond 15s past reward delivery and found the same pattern of results. (c) Additional optogenetic experiment in ChR2 rats and controls where the 2 sec of laser stimulation was at the onset of the cue. (d) Mean(+/−SEM) distance from port in the ITI following laser stimulation did not differ from no laser trials for GFP (p = 0.94, Wilcoxon signed-rank test, two-sided, n=7 rats) or ChR2 (p = 0.11, Wilcoxon signed-rank test, two-sided, n=10 rats) groups. (e) The effect of laser was similar across both groups (median: 0.06 GFP, n=7 rats; −0.09 ChR2, n=10 rats; p = 0.36, Wilcoxon rank-sum test, two-sided).
Extended Data Figure 5.
Extended Data Figure 5.. Value encoding in VP at the time of cue onset in the random sucrose/maltodextrin task.
(a) Schematic of model-fitting and neuron classification process. For each neuron, the reward outcome and spike count following reward delivery on each trial were used to fit two models: Value and Unmodulated. Akaike information criterion (AIC) was used to select which model best fit each VP neuron’s activity (right). (b) Mean(+/−SEM) activity of neurons best fit by each of the models, plotted according to previous outcome. (c) Coefficients(+/−SE) for outcome history linear regression for each class of neurons (n=39 Value and 397 Unmodulated neurons). (d) Mean(+/−SEM) activity of all Value neurons with trials binned by model-derived Value. (e) Mean(+/−SEM) population activity of simulated and actual Value neurons according to each trial’s Value (V). (f) Model recovery, plotted as the fraction of neurons simulated with each model recovered as that model.
Extended Data Figure 6.
Extended Data Figure 6.. Value encoding at the time of cue onset in the random sucrose/maltodextrin/water task.
(a) Fraction of VP neurons best fit by the Value and Unmodulated models in the random sucrose/maltodextrin/water task. (b) Mean(+/−SEM) activity of neurons best fit by each of the models, plotted according to previous outcome. (c) Coefficients(+/−SE) for outcome history linear regression for each class of neurons (n=38 Value and 216 Unmodulated neurons). (d) Mean(+/−SEM) population activity of simulated and actual Value neurons according to each trial’s Value (V). (e) Mean(+/−SEM) activity of all Value neurons with trials binned by model-derived Value. (f) Distribution of correlations between individual VP neurons’ firing rates at cue onset on each trial and the distance from the port during the previous ITI. * = p = 0.00001 for significant negative shift in mean correlation coefficient (vertical line) compared to 1000 shuffles of data for Value neurons, Wilcoxon signed-rank test, two-sided, as well as p = 0.0000002 for more negative coefficients for Value neurons compared to Unmodulated neurons, Wilcoxon rank-sum test, two-sided. See also Fig. 4c-d.
Extended Data Figure 7.
Extended Data Figure 7.. Placements for predictable and random sucrose/maltodextrin rats.
Recording locations for rats from predictable and random sucrose/maltodextrin experiment in Extended Data Fig. 8.
Extended Data Figure 8.
Extended Data Figure 8.. Cue-derived and outcome history-derived predictions separately impact VP firing
(a) Three distinct auditory cues indicated three trial types: a 50/50 probability of receiving sucrose or maltodextrin solutions, a 100% probability of receiving sucrose, or a 100% probability of receiving maltodextrin, as seen in the example session (right). (b) Median latency to enter reward port following onset of cue for each trial type, plotted as the mean(+/−SEM) across all sessions for each rat (gray lines, n=8, 9, 10, and 10 sessions for the 4 rats) and the overall mean(+/−SEM) (n=37 sessions). (c) Percentage sucrose of total solution consumption in a two-bottle choice, before (“Initial”) and after (“Final”) recording. (d) Mean(+/−SEM) lick rate relative to pump onset for each trial type. (e) Mean(+/−SEM) activity of all neurons recorded in the predictable and random sucrose/maltodextrin task, aligned to reward delivery. (f) Schematic of cue model-fitting and neuron classification process. The reward outcome and spike count from each trial were used to fit six models: RPE, Current outcome, and Unmodulated with and without the cue effect, which allowed a different weight for the impact of each cue. Neurons were classified according to Akaike information criterion. (g) Fraction of the population best fit by each model. (h) Coefficients(+/−SE) for outcome history regression for each class of neurons with no cue effect (n=38 RPE, 135 Current outcome, and 204 Unmodulated neurons). (i) Mean(+/−SEM) activity of all RPE neurons with no cue effect (n=38 neurons). The trials for each neuron are binned according to their model-derived RPE. (j) Population activity of simulated and actual VP RPE neurons with no cue effect according to each trial’s RPE value. (k) Scatterplot of each cue effect neuron’s weight for specific sucrose and maltodextrin cues (n=7 RPE, 33 Current outcome, and 70 Unmodulated cells with cue effects). The percentage of neurons falling in each quadrant is indicated. The percentage in our quadrant of interest (bottom right, positive value for sucrose and negative value for maltodextrin) did not differ from chance (p > 0.09 for exact binomial test compared to null of 25%). (l) Mean(+/−SEM) activity of neurons with sucrose values > 0 and maltodextrin values < 0, consistent with a value-based cued expectation modulation. (m) Neurons with cue effects for cue-evoked signaling, rather than reward-evoked signaling, as in (g). (n) As in (k), for activity at the time of the cue rather than time of reward (n=143 neurons with cue effects). * = p < 0.0001 for exact binomial test compared to null of 25%. (o) As in (l), for activity at the time of the cue rather than time of reward.
Extended Data Figure 9.
Extended Data Figure 9.. Classifying neurons with BIC instead of AIC.
(a) Fraction of neurons classified as RPE, Current outcome, and Unmodulated in VP and NAc in the random sucrose/maltodextrin task using Bayesian information criterion (BIC) as the selection criterion. (b) Coefficients(+/−SE) for outcome history regression for VP neurons of each BIC subset (n=37 RPE, 110 Current outcome, and 289 Unmodulated cells). (c) Population mean(+/−SEM) of all VP BIC RPE neurons, binned according to the model-derived RPE. (d) Mean(+/−SEM) population activity of simulated and actual BIC RPE neurons according to each trial’s RPE value for VP (left) and NAc (right). (e) Distribution of correlations between model-predicted and actual spiking for all RPE neurons from each region. (f) Distribution of α for RPE neurons in VP (green) and NAc (orange). (g) Mean(+/−SEM) activity of VP neurons classified as RPE by AIC but not BIC according to current and previous outcome. (h) Coefficients(+/−SE) for outcome history regression for these neurons (n=35 neurons). (i) Mean(+/−SEM) activity of these neurons binned according to model-derived RPE on each trial.
Figure 1.
Figure 1.. A subset of ventral pallidum neurons signal preference-based reward prediction errors.
(a, c-f) are adapted from (9). (a) Task: entering the reward port during a 10s cue triggered reward delivery. (b) The cue indicated 50/50 probability of receiving sucrose or maltodextrin solutions, as seen in example session (right). (c) Percentage sucrose of total solution consumption in a two-bottle choice, before (“Initial”) and after (“Final”) recording. (d) Mean(+/−SEM) lick rate relative to pump onset. (e) Mean(+/−SEM) activity of all recorded neurons on sucrose (Suc) and maltodextrin (Mal) trials.Gray rectangle indicates window used for analysis in (g-h,j) and all equivalent analyses in subsequent figures. (f) Mean(+/−SEM) activity of all recorded neurons on trials sorted by previous and current outcome. (g) Coefficients(+/−SE) from a linear regression fit to the z-scored activity of all neurons (n=436 neurons) and the outcomes on the current and preceding 10 trials. (h) Schematic of model-fitting and neuron classification process. For each neuron, the reward outcome and spike count following reward delivery on each trial were used to fit three models: RPE, Current outcome, and Unmodulated. Akaike information criterion (AIC) was used to select which model best fit each neuron’s activity (right). (i) Mean(+/−SEM) activity of neurons best fit by each of the three models, plotted according to previous and current outcome. (j) Coefficients(+/−SE) for outcome history linear regression for each class of neurons (n=72 RPE, 126 Current outcome, and 238 Unmodulated neurons).
Figure 2.
Figure 2.. RPE encoding is more prevalent and robust in VP than in NAc.
(a) Raster of an individual VP neuron’s spikes on each trial, aligned to reward delivery, and sorted by the model-derived RPE value for each trial. Gray shaded region indicates window used for analysis. (b) Population mean(+/−SEM) of all VP RPE neurons identified in Fig. 1. The trials for each neuron are binned according to their model-derived RPE. (c) Proportion of the population in VP and NAc classified as RPE, Current outcome, or Unmodulated. There were fewer RPE cells in NAc than in VP (8% versus 17%, χ2 = 8.3, p = 0.004) and Current outcome cells (14% in NAc versus 29% in VP, χ2 = 13.6, p = 0.0002). (d) Mean(+/−SEM) population activity of simulated and actual RPE neurons according to each trial’s RPE value for VP (top) and NAc (bottom). (e) The model-predicted and actual spike counts on each trial for one RPE neuron each from VP (top) and NAc (bottom). These neurons were the 85th percentile for correlation for each respective region. (f) Distribution of correlations between model-predicted and actual spiking for all RPE neurons from each region.
Figure 3.
Figure 3.. An expanded value space reveals stronger RPE signaling in VP.
(b-c) are adapted from (9). (a) A white noise cue indicated 1/3 probability each of receiving sucrose, maltodextrin, or water, as seen in the example session (right). (b) Mean(+/−SEM) lick rate relative to pump onset. (c) Mean(+/−SEM) activity of all recorded neurons on sucrose, maltodextrin, and water trials. (d) Fraction of the population of neurons recorded in this task best fit by each of the three models. (e) Coefficients(+/−SE) for outcome history regression for each of the three classes of neurons (n=74 RPE, 108 Current outcome, and 72 Unmodulated neurons). (f) Raster of an individual neuron’s spikes on each trial, aligned to reward delivery, and sorted by the model-derived RPE value for each trial. Gray shaded region indicates window used for analysis. (g) Population mean(+/−SEM) of all RPE neurons. The trials for each neuron are binned according to their model-derived RPE. (h) Mean(+/−SEM) population activity of simulated and actual VP RPE neurons according to each trial’s RPE value. (i) The model-predicted and actual spike counts on each trial for the RPE neuron with the 85th percentile correlation. (f) Distribution of correlations between model-predicted and actual spiking for all RPE neurons.
Figure 4.
Figure 4.. VP reward activity tracks changes in trial-by-trial task engagement.
(a) All locations of a rat from an example session during the intertrial interval (ITI) following sucrose (left), maltodextrin (center), and water (right) delivery. Each circle is one location during a 0.2s bin. X marks the location at cue onset for the subsequent trial. Chamber is 32.4cm x 32.4cm (approximately 306 x 306 pixels). (b) Mean(+/−SEM) distance from the port during ITI following sucrose (orange), maltodextrin (pink), and water (blue) trials during recording sessions (n=4 sessions from 3 rats). Gray lines represent mean for one subject in one session. * = ρ = −0.86, p = 0.0004, Spearman’s rank correlation coefficient between distance from port and reward preference ranking. (c) Approach for correlating the activity of individual VP cells with distance from the port on a trial-by-trial basis. (d) Distribution of correlations between individual VP neurons’ firing rates on each trial and the distance from the port during the subsequent ITI. * = significant shift in mean correlation coefficient (vertical lines) compared to 1000 shuffles of data for RPE (p = 8 * 10−10), Current outcome (p = 3 * 10−18), and Unmodulated neurons (p = 0.00008), Wilcoxon signed-rank test, two-sided.
Figure 5.
Figure 5.. Manipulation of VP reward activity bidirectionally alters task engagement.
(a) Optogenetic inhibition of VP with ArchT3.0. (b) Experimental approach to evaluate the contribution of VP to task engagement. Rats received a sucrose reward on every completed trial; on 50% of trials, they also received laser inhibition (left). Specifically, entry into the reward port during the 10s cue triggered delivery of sucrose 500ms later and 5s of constant green laser (right). We then evaluated the rats’ distance from the port in the subsequent ITI. (c) All locations of a rat from an example session during the intertrial interval (ITI) following sucrose delivery without laser (left) and with laser (right). Each circle is one location during a 0.2s bin. X marks the location at cue onset for the subsequent trial. Chamber is 29.2cm x 24.4cm (approximately 542 x 460 pixels). (d) Mean(+/−SEM) distance from the port in the ITI following sucrose with and without laser for animals receiving a control virus (YFP, left, n=7 rats) or the ArchT3.0 virus (right, n=7 rats). Individual rats’ data shown in gray lines. * = p = 0.02, Wilcoxon signed-rank test, two-sided. (e) Fractional change in ITI distance from port for each rat (median: −0.01 YFP, n=7 rats; 0.15 ArchT3.0, n=7 rats), * = p = 0.01, Wilcoxon rank-sum test, two-sided. (f) Optogenetic stimulation of VP with ChR2. (g) Like (b), but entry into the reward port during the cue triggered delivery of 2s of blue laser at 40Hz, 10ms pulse width (right). (h) All locations of a rat from an example session during the intertrial interval (ITI) following sucrose delivery without laser (left) and with laser (right). (i) Mean(+/−SEM) distance from the port in the ITI following sucrose with and without laser for animals receiving a control virus (GFP, left, n=7 rats) or the ChR2 virus (right, n=11 rats). Individual rats’ data shown in gray lines. * = p = 0.001, Wilcoxon signed-rank test, two-sided. (j) Fractional change in ITI distance from port for each rat (median: 0.09 GFP, n=7 rats; −0.14 ChR2, n=11 rats), * = p = 0.001, Wilcoxon rank-sum test, two-sided.
Figure 6.
Figure 6.. VP RPE neuron signaling adapts across reward blocks.
(a) A white noise cue indicated an overall 50/50 probability of receiving sucrose or maltodextrin solutions, but the order of trials was structured into blocks of thirty trials, as seen in example session (right). (b) Mean(+/−SEM) lick rate relative to pump onset. (c) Proportion of neurons best fit by each of the three models in the random and blocked sucrose/maltodextrin tasks. (d) Mean(+/−SEM) activity of all RPE neurons from the blocks tasks aligned to cue onset and to reward delivery. (e) Mean(+/−SEM) activity of all RPE neurons from the random sucrose/maltodextrin task aligned to cue onset and to reward delivery. (f) RPE model simulations (left) and mean(+/−SEM) activity of RPE, Current outcome, and Unmodulated cells from the random sucrose/maltodextrin task, plotted in bins of three trials evenly spaced throughout all completed sucrose and maltodextrin trials. (g) As in (f), for blocked sessions with sucrose first. (h) As in (f) and (g) for blocked sessions with maltodextrin first.

Similar articles

Cited by

References

    1. Sutton RS & Barto AG Introduction to reinforcement learning (MIT press; Cambridge, 1998).
    1. Rescorla RA & Wagner AR A theory of pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. Classical conditioning II: Current research and theory 2, 64–99 (1972).
    1. Schultz W, Dayan P & Montague PR A neural substrate of prediction and reward. Science 275, 1593–1599 (1997). - PubMed
    1. Bayer HM & Glimcher PW Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005). - PMC - PubMed
    1. Smith KS, Tindell AJ, Aldridge JW & Berridge KC Ventral pallidum roles in reward and motivation. Behavioural brain research 196, 155–167 (2009). - PMC - PubMed

Publication types