Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 6:8:e49744.
doi: 10.7554/eLife.49744.

Lateral orbitofrontal cortex promotes trial-by-trial learning of risky, but not spatial, biases

Affiliations

Lateral orbitofrontal cortex promotes trial-by-trial learning of risky, but not spatial, biases

Christine M Constantinople et al. Elife. .

Abstract

Individual choices are not made in isolation but are embedded in a series of past experiences, decisions, and outcomes. The effects of past experiences on choices, often called sequential biases, are ubiquitous in perceptual and value-based decision-making, but their neural substrates are unclear. We trained rats to choose between cued guaranteed and probabilistic rewards in a task in which outcomes on each trial were independent. Behavioral variability often reflected sequential effects, including increased willingness to take risks following risky wins, and spatial 'win-stay/lose-shift' biases. Recordings from lateral orbitofrontal cortex (lOFC) revealed encoding of reward history and receipt, and optogenetic inhibition of lOFC eliminated rats' increased preference for risk following risky wins, but spared other sequential effects. Our data show that different sequential biases are neurally dissociable, and the lOFC's role in adaptive behavior promotes learning of more abstract biases (here, biases for the risky option), but not spatial ones.

Keywords: decision-making; learning; neuroscience; orbitofrontal cortex; rat; reinforcement learning; sequential bias.

PubMed Disclaimer

Conflict of interest statement

CC, AP, PB, AA, CK, CB No competing interests declared

Figures

Figure 1.
Figure 1.. Behavioral task: Rats performing the task exhibit stable performance over months, but also trial-by-trial learning dynamics.
(A) Example trial: rat initiates a trial by nose-poking and fixating in center. On each side, light flashes and click rates convey reward probability and water volume, respectively. One side (here, the right port) offers guaranteed reward (‘safe’); safe and risky sides vary randomly over trials. (B) Relationship between flashes and probability, and click rates and reward volumes (6, 12, 24, or 48 μL) in one version of the task. Risky side could have rewarded probability between 0–1 (increments of 0.1). (C) Offered reward volumes and probabilities. (D) Behavioral performance in units of ‘efficiency’ for five representative rats in the final training stage (Materials and methods). We compared the average expected value (reward x probability) per trial the rat received compared to an agent choosing randomly, or one that always chose the option with the greater expected value (‘ideal performance’). The dashed line is criterion performance for each rat (see ‘Materials and methods’). (E) Percent of trials one rat chose the safe option for each of the four safe volumes. Axes show probability and volume of risky alternatives. (F) Difference in probability of choosing the safe option following guaranteed rewards and risky rewards (relative to the mean probability of choosing safe) for all rats (black is mean). Rats were more likely to gamble following risky rewards (p=8.35e-16, paired t-test). (G) The magnitude of the risky win-stay bias exhibits graded dependence on the reward probability of the gamble (mean across rats). p=0.0035 of slope parameter of least-squares regression line (dashed line). The riskier the gamble that won, the more likely that rats will choose to gamble again. See also Figure 1—figure supplement 1. (H) Change in the probability of repeating left or right choices following rewarded or unrewarded trials. Asterisks indicate that rats’ ‘win-stay’ biases were significantly different from zero (p=2.06e-13, paired t-test), as were their ‘lose-switch’ biases (p=2.65e-15).
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. Supplemental behavioral analyses.
(A) Distribution of inter-trial intervals (ITIs) for three representative rats. Trials were self-paced, rats were free to initiate trials within 100–200 ms of the preceding trial. If rats terminated the trial early by breaking center fixation, they were penalized with a time-out penalty (those trials are not shown). (B) Average number of trials per session for each rat, excluding trials that were terminated prematurely. Mean of this distribution (368 trials/session) is shown by the red arrow. (C) Mean behavioral performance across rats, including >2.5 million trials. Percent of trials all rats chose the safe option for each of the four safe side volumes. Axes show the probability and volume of risky alternatives. Mean performance across 36 rats (normalized to max before averaging). (D) Estimates of conditional probabilities in finite sequential data can have small biases (Miller and Sanjurjo, 2015). If this bias were driving sequential effects in our data, such as increased willingness to take risks following risky wins, we reasoned that computing this bias from random flips (of the same length as our data) of a weighted coin would also reveal an effect. Therefore, we generated random choices for each rat with a generative probability corresponding to the mean probability of choosing the safe option for that rat. We then calculated the change in the probability of choosing the safe option based on reward history for the simulated choices; the same number of trials that were used in Figure 1F were applied to this analysis. There was no observable risky win-stay bias in the simulated dataset, indicating that the effect we observed did not reflect biased estimates of conditional probabilities. (E) Difference in probability of choosing the safe option following guaranteed rewards and risky rewards of different probabilities (relative to the mean probability of choosing safe) for simulated data, as in B. Randomly simulated choices with the same sample sizes as the data (Figure 1G) did not exhibit a bias for risky choices with a graded dependence on reward probability. p=0.80 of slope parameter of least-squares regression line (dashed line). Therefore, the risky win-stay bias we observe, with graded dependence on reward probability, does not reflect biased estimation of conditional probabilities. (F) Difference in probability of choosing the safe option following guaranteed rewards, or risky unrewarded choices. There was no systematic, significant change in probability of choosing safe following unrewarded trials (paired t-test comparing change in probability of choosing safe).
Figure 2.
Figure 2.. lOFC encodes reward history during the cue period.
(A) lOFC neuron with activity aligned to trial initiation. This neuron’s firing rate reflected whether the previous trial was rewarded. (B) Mean encoding of reward history (discriminability or d’) across lOFC neurons that exhibited significantly different spike counts based on reward history. Mean ± s.e.m. See also Figure 2—figure supplement 1. (C) Fraction of neurons with significantly different spike counts based on reward history, with more spikes following unrewarded (no rew >rew) or rewarded (rew >no rew) trials. (D) Schematic of analysis (TCA/PARAFAC) used to discover low dimensional descriptions of trial-by-trial population dynamics. See also Figure 2—figure supplement 2. (E) Result of TCA/PARAFAC from one recording session. Y-axis is in arbitrary units (A.U.; see Materials and methods). (F) Mean (± s.d.) shuffle-corrected reward (blue) and no-reward (black) triggered averages of trial factors across all sessions (see Materials and methods). (G) Correlation between trial factors and reward history for each session. Gray bars indicate significance.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. Method for identifying putatively identical waveforms over days.
(A) Distribution of d1 and d2 values comparing waveforms across rats produces a null distribution (gray, see Materials and methods). Distribution of values comparing waveforms within rats from subsequent recording sessions (red). Dashed lines are empirically chosen thresholds. (B) Subplot from panel A. (C) Neuron that was identified as putatively identical across four recording sessions. Raster plots only show 150 trials (out of 400–600 each day) for display purposes (upper panels). PSTHs (derived from all trials) are shown below (lower panels). (D) Figure 2B was reproduced combining putatively identical units recorded over multiple days. Mean discriminability index (d’) depending on whether the previous trial was rewarded or not, computed in 50 ms bins. Error bars are ± s.e.m.
Figure 2—figure supplement 2.
Figure 2—figure supplement 2.. TCA/PARAFAC tensor decomposition applied to neural data.
(A–C) Method used to determine model rank. We performed 20 random initializations and compared the similarity of the factors recovered from each iteration to those recovered from the previous one. We show the distribution of similarity indices for example recording sessions that were determined to be rank 1, 2, and 3 (A,B,C, respectively). Black lines are mean ± s.e.m. The majority of the data was either rank 1 or 2 (50/105 sessions were rank 1, 50/105 were rank 2, 5/105 were rank 3), so for simplicity, we fit a rank one model to each session. (D) Four neurons from the recording session in panel 3D; firing rates are plotted when the trial factor was high (>85 th percentile) or low (<15 th percentile). (E) Mean (± s.d.) shuffle-corrected reward (blue) and no-reward (black) triggered averages of trial factors across all sessions (see Materials and methods), excluding cells that had significantly different spike counts following rewarded or unrewarded trials. (F) Mean (± s.d.) shuffle-corrected reward (blue) and no-reward (black) triggered averages of trial factors, excluding random subsets of cells, of the same number that were excluded in panel E. (G) Distribution of simultaneously recorded units across all recording sessions. (H) Relationship between the Pearson’s correlation between trial factors and reward history, and number of units recorded in each session. (I) Relationship between the absolute magnitude of the Pearson’s correlation between trial factors and reward history, and number of units recorded in each session.
Figure 3.
Figure 3.. Optogenetic perturbation of lOFC during the cue period does not affect spatial or risky trial history biases.
(A) Schematic of bilateral optogenetic perturbations. For CaMKIIα-eNpHR3.0 rats (n = 8), we used continuous illumination of a green laser for photoinhibition. For Pvalb-iCre-ChR2 rats (n = 5), a blue laser was pulsed at 20 Hz. See also Figure 3—figure supplement 1. While the schematic shows a 3 s trial, trial durations were variable (2.6–3.35 s); photoinhibition persisted for the duration of the cue period. (B) Histological section from Pvalb-iCre-ChR2 rats also stained for DAPI and parvalbumin (PV) immunoreactivity. (C) Virus injection in a wild type rat expressing CaMKIIα-eNpHR3.0. Location of fibers were estimated by damage at brain surface and fiber tracks. (D) Magnitude of spatial win-stay and lose-switch biases (difference in probability of repeating a left or right choice) on control and laser trials. Error bars are normal approximation of 95% confidence intervals (Materials and methods). (E) Magnitude of risky win-stay bias (difference in probability of choosing the safe option following safe or risky rewards) on control and laser trials.
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. Characterization of photoinhibition in Pvalb-iCre-ChR2 rats.
(A) Representative unit recorded from Pvalb-iCre rats expressing ChR2. Three epochs of photoinhibition (blue lines) reliably suppressed spiking activity. Blue laser was pulsed at 20 Hz, 10 ms pulse width, for 8 s. (B) Normalized activity of the cell shown in A; mean ± s.e.m. over 30 photoinhibition epochs. (C) Mean suppression over 65 recorded units from two rats. (D) Activity change of each unit plotted as a function of its distance from the optical fiber. Units were recorded in four tracks, 250, 500, 750, or 1000 μm from the fiber tip. Robust photoinhibition was observed in all tracks. (E) Example injection site shown in Figure 4C; inset shows putative fiber track. (F) Percent of parvalbumin-immunoreactive cells that co-expressed eYFP in Pvalb-iCre rats expressing eYFP-ChR2 (left), and fraction of eYFP-expressing cells co-labeled for parvalbumin immunoreactivity (right).
Figure 4.
Figure 4.. At time of choice report, lOFC neurons represent risk, reward, and left/right choice.
(A) Example lOFC neuron with activity aligned to when the rat left the center poke to report his choice. This neuron’s firing rate reflected whether the rat chose the risky (magenta) or safe (black) option on the current trial, analyzing rewarded trials only. (B) Mean d’ across lOFC neurons with significantly different spike counts on trials with risky or safe choices. See also Figure 4—figure supplement 1. (C,D) Mean z-scored firing rate of neurons in panel B aligned to entering the center poke (C), or leaving it to report choice (D). (E) Fraction of neurons in panels B-D that preferred trials when rats made risky or safe choices. Higher firing rates on trials in which rats chose the safe reward could reflect encoding of decision confidence or reward expectation (Lak et al., 2014). (F) Mean d’ reflecting whether rats chose the left/right ports, or whether rats received reward, averaged across neurons with significantly different spike counts on those trials. See also Figure 4—figure supplement 1. (G) Venn diagram of overlap between neurons whose activity differentiated between left/right choices and rewarded/unrewarded trials. (H) Fraction of neurons in panels F,G preferring left/right choices or rewarded/unrewarded trials.
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. Results do not depend on whether units are treated independently over days.
(A,B) Figure 4B (A) and 4F (B) were reproduced combining putatively identical units recorded over multiple days. Mean discriminability index (d’) depending on whether the rat chose the safe or risky option on rewarded trials only (A), chose left or right (B, purple), or was rewarded (B, yellow), computed in 50 ms bins. Error bars are ± s.e.m. (C) Of the units with significantly different spike counts on trials in which rats chose risky or safe, the fraction selective (or not) for choosing the left or right port.
Figure 5.
Figure 5.. Photoinhibition of lOFC at the time of choice report selectively eliminates the risky win-stay bias.
(A) For choice reporting period perturbations, the laser was triggered when rats left the center poke, and persisted for 4 s into the inter-trial-interval. See also Figure 5—figure supplement 1. (B) Spatial win-stay/lose-switch biases following photoinhibition during the choice reporting period; sham rats also exhibited a significant reduction in lose-switch biases, and trended towards a reduction in win-stay biases. Control data are replotted from Figure 3D. (C) Magnitude of the risky win-stay bias following choice reporting period inactivations. Control data are replotted from Figure 3E. Error bars are 95% confidence intervals.
Figure 5—figure supplement 1.
Figure 5—figure supplement 1.. Photoinhibition during the choice reporting period does not affect baseline performance, but selectively reduces the risky win–stay bias.
(A) Psychometric performance for each CaMKIIα-eNpHR3.0 rat on control trials (black) and trials following photoinhibition during the choice period. These plots include all trials, regardless of trial history, so the elimination of the risky win-stay bias is not evident. (B) Psychometric performance for each Pvalb-iCre-ChR2 rat on control trials (black) and trials following photoinhibition during the choice period. (C) Difference in logistic regression coefficients (control - photoinhibition) parameterizing different choice biases. Data are mean ± standard deviation across rats. Asterisks indicate significant Bonferroni-corrected p-value from one-way ANOVA (p=0.0063).

Similar articles

Cited by

References

    1. Abrahamyan A, Silva LL, Dakin SC, Carandini M, Gardner JL. Adaptable history biases in human perceptual decisions. PNAS. 2016;113:E3548–E3557. doi: 10.1073/pnas.1518786113. - DOI - PMC - PubMed
    1. Akaishi R, Kolling N, Brown JW, Rushworth M. Neural mechanisms of credit assignment in a multicue environment. Journal of Neuroscience. 2016;36:1096–1112. doi: 10.1523/JNEUROSCI.3159-15.2016. - DOI - PMC - PubMed
    1. Akrami A, Kopec CD, Diamond ME, Brody CD. Posterior parietal cortex represents sensory history and mediates its effects on behaviour. Nature. 2018;554:368–372. doi: 10.1038/nature25510. - DOI - PubMed
    1. Aronov D, Tank DW. Engagement of neural circuits underlying 2D spatial navigation in a rodent virtual reality system. Neuron. 2014;84:442–456. doi: 10.1016/j.neuron.2014.08.042. - DOI - PMC - PubMed
    1. Blanchard TC, Wilke A, Hayden BY. Hot-hand Bias in rhesus monkeys. Journal of Experimental Psychology: Animal Learning and Cognition. 2014;40:280–286. doi: 10.1037/xan0000033. - DOI - PubMed

Publication types