Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan:117:84-92.
doi: 10.1016/j.nlm.2014.07.010. Epub 2014 Aug 27.

Dynamic shaping of dopamine signals during probabilistic Pavlovian conditioning

Affiliations

Dynamic shaping of dopamine signals during probabilistic Pavlovian conditioning

Andrew S Hart et al. Neurobiol Learn Mem. 2015 Jan.

Abstract

Cue- and reward-evoked phasic dopamine activity during Pavlovian and operant conditioning paradigms is well correlated with reward-prediction errors from formal reinforcement learning models, which feature teaching signals in the form of discrepancies between actual and expected reward outcomes. Additionally, in learning tasks where conditioned cues probabilistically predict rewards, dopamine neurons show sustained cue-evoked responses that are correlated with the variance of reward and are maximal to cues predicting rewards with a probability of 0.5. Therefore, it has been suggested that sustained dopamine activity after cue presentation encodes the uncertainty of impending reward delivery. In the current study we examined the acquisition and maintenance of these neural correlates using fast-scan cyclic voltammetry in rats implanted with carbon fiber electrodes in the nucleus accumbens core during probabilistic Pavlovian conditioning. The advantage of this technique is that we can sample from the same animal and recording location throughout learning with single trial resolution. We report that dopamine release in the nucleus accumbens core contains correlates of both expected value and variance. A quantitative analysis of these signals throughout learning, and during the ongoing updating process after learning in probabilistic conditions, demonstrates that these correlates are dynamically encoded during these phases. Peak CS-evoked responses are correlated with expected value and predominate during early learning while a variance-correlated sustained CS signal develops during the post-asymptotic updating phase.

Keywords: Dopamine; Pavlovian; Reinforcement learning; Reward-prediction error; Uncertainty.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Rats in all groups approached the CS during the first session of training, but by session 6, rats in the 0 probability group (n=4) responded significantly less than rats in the 0.25 (n = 4), 0.5 (n=5), 0.75 (n=4), and 1.0 (n=5) groups. Bars show mean plus standard error (***: P < 0.001 with respect to non-paired group, Bonferroni corrected t-test).
Figure 2
Figure 2
Coronal sections of rat brain show locations for (n = 24) voltammetry electrodes chronically implanted in nucleus accumbens core. Brain atlas sections are from Paxinos and Watson (2005).
Figure 3
Figure 3
(a-c) Example dopamine traces on individual trials recorded at individual electrodes in the 0, 0.5 and 1.0 groups are shown. For the 0.5 group, reward trials and reward omission trials are shown separately, though they were randomly interleaved during training. Traces were smoothed with a 5-point running average. Gray points indicate data that was excluded because residual error after principal components regression was large enough to reject the null hypothesis (P < 0.05) that error was due to random noise. (d-f) Mean ± SEM of session-averaged traces for the 0 (n = 5), 0.5 (n = 6), and 1.0 (n = 5) groups are shown. Traces from reward and reward omission trials for each session were averaged separately to illustrate differential responding at the time of the US. Colored boxes represent the analysis windows during CS presentation (red: early CS, green: peak CS, blue: late CS).
Figure 4
Figure 4
Mean ± SEM of session-averaged traces for the 0.25 (n = 4), 0.75 (n = 4) groups are shown. Traces from reward and reward omission trials for each session were averaged separately to illustrate differential responding at the time of the US. Colored boxes represent the analysis windows during CS presentation (red: early CS, green: peak CS, blue: late CS).
Figure 5
Figure 5
Regression weights for B1 (a) from the first-order model and B2 (b) from the second-order model are shown for each time point (-1s to 8 s from CS-onset, 0.1 s interval) for sessions 1 through 6. (c-e) F-statistics for significant (P < 0.05) least squares fits for the first- (c) and second-order (d) models, as well as for comparison between the two models (e) are shown for the regressions in a and b. Gray indicates that the F test was not significant for that time point. (f-h) r2 values are shown for the first- (f) and second-order (g) models in a and b, as well as the net increase in r2 (h) for the second-order over the first-order model at each time point.
Figure 6
Figure 6
(a-b) Group mean ± SEM for dopamine responses in early-CS (left), peak-CS (middle), and late-CS (right) epochs for sessions 2 (a) and 6 (b) are plotted with respect to reinforcement probability. Curves for first-order (blue) and second-order (red) models fits are shown for each epoch. The heavier curve in each plot is for the model that produces the greater net increase in r2. All first-order fits in session 2 are significant. Second order fits for peak-CS and late-CS in session 6 are significant. The first order fit for early-CS in session 6 borders on significance (See table 1 for statistics).
Figure 7
Figure 7
(a) Slopes (B1) and intercepts (B0) are shown for linear regressions at each time points (-1 s to 6 s relative to US-onset, 0.1 s interval) for rewarded trials in sessions 1 through 6. Significant (P < 0.05) F statistics (b) and r2 (c) values are shown for linear regressions in a.
Figure 8
Figure 8
Group mean ± SEM for dopamine responses in the peak-US epoch are shown for reward and reward omission trials in sessions 2 (a) and session 6 (b). Linear regressions are significant for responses from reward trials for both sessions, but not for responses from omission trials.
Figure 9
Figure 9
(a-c) Scatter plots show the mean dopamine response over the peak-CS epoch from trials following two rewards vs. the mean response from trials following two omissions from early learning (a), late learning (b), and asymptotic (c) stages for electrodes in uncertain probability groups. Points above the line indicate that the signal following two rewards is greater than the signal following two omissions. (d) Bar graph shows Mean ± SE of the difference for within electrode contrasts for the responses in a-c (early: n = 14, late: n = 14, asymptotic: n = 12). (f-g) Scatter plots and bar graph show the same data as a-d but for the late-CS epoch. (**: P < 0.01, *: P < 0.05 Paired t-test. Holm-Bonferroni correction was applied to α levels)

Similar articles

Cited by

References

    1. Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. - PMC - PubMed
    1. Bayer HM, Lau B, Glimcher PW. Statistics of midbrain dopamine neuron spike trains in the awake primate. J Neurophysiol. 2007;98:1428–1439. - PubMed
    1. Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an uncertain world. Nat Neurosci. 2007;10:1214–1221. - PubMed
    1. Clark JJ, Collins AL, Sanford CA, Phillips PEM. Dopamine encoding of Pavlovian incentive stimuli diminishes with extended training. J Neurosci. 2013;33(8):3526–3532. - PMC - PubMed
    1. Clark JJ, Sandberg SG, Wanat MJ, Gan JO, Horne EA, Hart AS, Akers CA, Parker JG, Willuhn I, Martinez V, Evans SB, Stella N, Phillips PEM. Chronic microsensors for longitudinal, subsecond dopamine detection in behaving animals. Nat Methods. 2010;7:126–129. - PMC - PubMed

Publication types

LinkOut - more resources