Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep;153(Pt B):137-143.
doi: 10.1016/j.nlm.2018.01.013. Epub 2018 Jan 31.

Orbitofrontal neurons signal reward predictions, not reward prediction errors

Affiliations

Orbitofrontal neurons signal reward predictions, not reward prediction errors

Thomas A Stalnaker et al. Neurobiol Learn Mem. 2018 Sep.

Abstract

Neurons in the orbitofrontal cortex (OFC) fire in anticipation of and during rewards. Such firing has been suggested to encode reward predictions and to account in some way for the role of this area in adaptive behavior and learning. However, it has also been reported that neural activity in OFC reflects reward prediction errors, which might drive learning directly. Here we tested this question by analyzing the firing of OFC neurons recorded in an odor discrimination task in which rats were trained to sample odor cues and respond left or right on each trial for reward. Neurons were recorded across blocks of trials in which we switched either the number or the flavor of the reward delivered in each well. Previously we have described how neurons in this dataset fired to the predictive cues (Stalnaker et al., 2014); here we focused on the firing in anticipation of and just after delivery of each drop of reward, looking specifically for differences in firing based on whether the reward number or flavor was unexpected or expected. Unlike dopamine neurons recorded in this setting, which exhibited phasic error-like responses after surprising changes in either reward number or reward flavor (Takahashi et al., 2017), OFC neurons showed no such error correlates and instead fired in a way that reflected reward predictions.

Keywords: Learning; Orbitofrontal; Rat; Reward prediction error; Single unit.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Behavior and Histology. A. Task sequence. After initiating a trial with a nosepoke, an odor was delivered for 500 ms, after which rats responded at one of the two fluid wells for 1 or 3 drops of chocolate or vanilla milk solution, delivered 500 ms after the well poke. Two odors indicated forced choices, left or right; a third odor indicated free choice. Reward contingencies were stable across blocks of ~ 60 trials, but switched in number of drops (dashed lines) or flavor (dotted lines) in four unsignaled transitions. Rewards in the two directions always differed in both number of drops and flavor (only one of the four possible block sequences is shown). B. Chocolate and vanilla milk were equally preferred in a ten minute consumption test in a separate group of rats (t10 = 0.1, p = .93). C. Free-choice rates in the task reflected the number of drops but not the flavor. Number block switches (left panel) had a similarly large effect on choice rates for chocolate → chocolate compared to vanilla → vanilla switches. Flavor block switches (right panel) had no effect on choice rates for big vanilla → big chocolate or big chocolate → big vanillla switches. Line figures show average trial-by-trial choice rates on free-choice trials on an x-axis scale that includes all interleaved correct free- and forced-choice trials; inset bar graphs compare average choice rate on all free-choice trials within the last 25 before a block switch and the first 25 after a block switch (again, this 25-trial period includes interleaved correct free- and forced- choice trials). ANOVA on difference in choice rates across transitions, with factors transition type and initial flavor; main effect of transition type (F1,92 = 195.7, p < .001), driven by significant changes across number transitions (planned contrast, F1,92 = 445.9, p < .0001), and insignificant changes across flavor transitions (planned contrast, F1,92 = 1.3, p = .27); no effect of initial flavor (F1,92 = 0.0, p = .93); no differences between vanilla-to-chocolate and chocolate-to-vanilla (planned contrast. F1,92 = 2.3, p = .13). A focused comparison of the magnitude of changes across number and flavor transitions included in the subsequent analyses, between this experiment and a separate one in which dopamine neurons were recorded revealed the following results: No difference across included flavor transitions (t137 = 0.6, p = .57) and larger changes in choice rate across number transitions in this experiment than in the dopamine experiment (t249 = 3.1, p < .01). D. Reaction time (top panel) and accuracy (bottom panel) reflected the number of drops expected but not the flavor. Bar graphs show average reaction time (from end of odor to start of movement) or accuracy on forced-choice trials within the last 25 trials of blocks. Within-subjects ANOVAs on reaction time and accuracy: main effects of reward number (F1,93 = 62.2, p < .001; F1,93 = 182.3, p < .001) but not flavor (F1,93 = 0.3, p = .57.; F1,93 = 5.3, p = 0.024), nor interactions (F1,93 = 0.1, p = .73.; F1,93 = 5.1, p = 0.027). Two additional ANOVAs on reaction time and accuracy compared this experiment and a separate one in which dopamine neurons were recorded. These revealed no interactions of flavor or reward number with experiment (reward number: F1,124 = 0.5, p = .49; F1,124 = 1.6, p = .20; flavor: F1,124 = 0.0, p = .98; F1,124 = 0.0, p = .96). E. Recording sites in OFC. The black boxes indicate the approximate location from which recordings were made in each rat (in the left hemisphere). The width represents the estimated span of the electrode bundle (— 1 mm), and the height represents the approximate extent of recording across all sessions. Bregma + 2.8 to 3.6 mm. Not significant if corrected p-value criterion is used (p < .0167 by Bonferroni correction), so that the family- wise p-criterion across the three separate ANOVAs on flavor, in panels C–D, was equal to 0.05.
Fig. 2.
Fig. 2.
Reward-evoked activity of reward-responsive orbitofrontal cortical neurons (n = 347) after shifts in reward number. (A and B) Average baseline-subtracted firing on first five (red) and last five (blue) trials after a shift in reward number, from one drop to three drops (A) and from three drops to one drop (B). Both correct free- and forced-choice trials were included. Shading represents the standard error at each bin. (C and D) Distribution of difference scores for the epoch from 100 to 300 ms after the unexpected second drop (C), in which dopamine neurons reflect a positive prediction error (inset in A), and for the epoch from 200 ms before to 100 ms after the time of the omitted second drop (D), which precedes the dopamine negative prediction error response (inset in B). OFC neurons thus fail to signal prediction errors (A and C) but do signal outcome predictions (B and D). Statistics above histograms show average difference score and p-value for a t-test on the population (for C, t346 = 0.79; for D, t346 = 3.7). The dopamine population had significantly higher positive prediction error indices than those of the OFC population shown in C (t405 = 5.1, p < .0001) and the OFC population had significantly higher anticipatory indices, shown in D, than those in the dopamine population (t405 = 2.9, p < .01). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3.
Fig. 3.
Reward-evoked activity of reward-responsive orbitofrontal cortical neurons after shifts in reward flavor. (A and B) Average baseline- subtracted firing on first five trials after a shift in reward flavor (red) versus last five trials from the previous block (green), on the 3-drop side (A) and on the 1-drop side (B). Both correct free- and forced-choice trials were included. The number of drops on each side remained constant across the shift. Shading represents the standard error at each bin. (C-D) Distribution of difference scores for the epochs from 100 to 300 ms after the first and second drops of the new flavor on the 3-drop side (C), and after the first (and only) drop of the new flavor on the 1-drop side (D). Dopamine neurons showed a significantly positive prediction error score in response to flavor changes at each of these timepoints (insets in A and B), whereas the OFC population did not. Statistics above the histograms show average difference score and p- value for a t-test on the population, with each neuron X shift providing a datapoint (For C, t431 = 1.5, left panel, t346 = 1.2 for right panel; for D, t431 = 0.47). Flavor shifts were only included when behavior showed evidence of the shift (104 of 176 total flavor shifts, on which were recorded 296 neurons; see Section 2 for definition of behavioral evidence of a shift). The dopamine population had a significantly higher prediction error score than that in the OFC population for each bolus of new flavor (first drop on 3-drop side: t478 = 2.1, p < .05; second drop on 3-drop side: t478 = 2.4, p < .05; first drop on 1- drop side: t478 = 2.1, p < .05). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 4.
Fig. 4.
The OFC response in anticipation of the absent second drop after a flavor shift, comparing the response when the rat showed behavioral evidence of the shift (left) to that when the rat showed no evidence of the shift (right). (A and B) Average baseline-subtracted firing on first two trials after a shift in reward flavor (red) versus last five trials from that block (blue), on the 1-drop side when behavior reflected the change (A) and when it did not (B). Both correct free- and forced-choice trials were included. Colored shading represents standard error at each bin. The phasic increase in A early in the block in the gray-shaded epoch shows that the OFC population made a reward prediction based on the flavor change, even though that change did not elicit a prediction error signal in this population (see main text). (C and D) Distribution of difference scores for the epoch from 200 ms before the time that the second drop would be expected, to 100 ms after it. Statistics above the histograms show average difference score and p-value for a t-test on the population, with each neuron X shift providing a datapoint (For C, t431 = 2.7; for D, t261 = −1.0; for the comparison between C and D, t692 = − 2.5, p = .014). See Section 2 for definition of behavioral evidence of a flavor shift. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Similar articles

Cited by

References

    1. Blanchard TC, Hayden BY, & Bromberg-Martin ES (2015). Orbitofrontal cortex uses distinct codes for different choice attributes in decisions motivated by curiousity. Neuron, 85, 602–614. - PMC - PubMed
    1. Boorman ED, Rajendran VGX, O’Reilly J, & Behrens TE (2016). Two anatomically and computationally distinct learning signals predict changes to stimulus-outcome associations in hippocampus. Neuron, 89, 1343–1354. - PMC - PubMed
    1. Camille N, Griffiths CA, Vo K, Fellows LK, & Kable JW (2011). Ventromedial frontal lobe damage disrupts value maximization in humans. Journal of Neuroscience, 31, 7527–7532. - PMC - PubMed
    1. Gallagher M, McMahan RW, & Schoenbaum G (1999). Orbitofrontal cortex and representation of incentive value in associative learning. Journal of Neuroscience, 19, 6610–6614. - PMC - PubMed
    1. Glascher J, Daw N, Dayan P, & O'Doherty JP (2010). States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron, 66, 585–595. - PMC - PubMed

Publication types