Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug;135(4):550-570.
doi: 10.1037/bne0000474.

Unique features of stimulus-based probabilistic reversal learning

Affiliations

Unique features of stimulus-based probabilistic reversal learning

Carl Harris et al. Behav Neurosci. 2021 Aug.

Abstract

Reversal learning paradigms are widely used assays of behavioral flexibility with their probabilistic versions being more amenable to studying integration of reward outcomes over time. Prior research suggests differences between initial and reversal learning, including higher learning rates, a greater need for inhibitory control, and more perseveration after reversals. However, it is not well-understood what aspects of stimulus-based reversal learning are unique to reversals, and whether and how observed differences depend on reward probability. Here, we used a visual probabilistic discrimination and reversal learning paradigm where male and female rats selected between a pair of stimuli associated with different reward probabilities. We compared accuracy, rewards collected, omissions, latencies, win-stay/lose-shift strategies, and indices of perseveration across two different reward probability schedules. We found that discrimination and reversal learning are behaviorally more unique than similar: Fit of choice behavior using reinforcement learning models revealed a lower sensitivity to the difference in subjective reward values (greater exploration) and higher learning rates for the reversal phase. We also found latencies to choose the better option were greater in females than males, but only for the reversal phase. Further, animals employed more win-stay strategies during early discrimination and increased perseveration during early reversal learning. Interestingly, a consistent reward probability group difference emerged with a richer environment associated with longer reward collection latencies than a leaner environment. Future studies should systematically compare the neural correlates of fine-grained behavioral measures to reveal possible dissociations in how the circuitry is recruited in each phase. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

PubMed Disclaimer

Conflict of interest statement

There is no conflict of interest or need for disclosure. One of the senior authors (Alicia Izquierdo) is an Associate Editor of Behavioral Neuroscience.

Figures

Figure 1
Figure 1. Task Design
Note. Schematic of probabilistic learning task. Rats initiated a trial by nosepoking the center stimulus (displayed for 40 s) and then selected between two visual stimuli pseudorandomly that were presented on either the left and right side of the screen (displayed for 60 s), Assigned as the better (B) and worse (W) options. Correct nosepokes were rewarded with a sucrose pellet with probability pR(B) = .90 or .70 versus pR(W) = .30. If a trial was not rewarded [pNR(B) or pNR(W)], a 5 s time-out would commence. If a stimulus was not chosen, it was considered a choice omission and a 10 s ITI would commence. Rats could also fail to initiate a trial, in which case, it was scored as an initiation omission. If a trial was rewarded, a 10 s ITI would follow reward collection. Other prominent measures collected on a trial-by-trial basis were trial initiation latency (time to nosepoke the center white square), choice latency (time to select between the two stimuli), and reward latency (time to collect reward in the pellet magazine).
Figure 2
Figure 2. Greater Number of Completed Sessions for the 70–30 Reward Probability Group in Both Discrimination and Reversal Learning
Note. (A and B) Plotted are the number of subjects per session (A) and the number of sessions to criterion (B) during the discrimination (prereversal) phase. The 70–30 reward probability group completes significantly more sessions during discrimination than the 90–30 group. (C and D) The same as (A) and (B) but during reversal learning. The 70–30 reward probability group again completes significantly more sessions than the 90–30 group. Bars indicate ±SEM. * p ≤ .05.
Figure 3
Figure 3. Both Reward Probability Groups and Both Sexes Increase Their Collected Rewards Over Time but Animals Choose the Better Option More Often in the Discrimination Phase
Note. (A and B) Proportion of better option selections (A) and number of rewards in a session (B) in the discrimination (prereversal) phase, showing the first 10 and 15 trials. Both groups increase selection of the better option and receive more rewards per session over time, with no significant differences between reward probability groups or sex. (C and D) Same as (A) and (B), but in the reversal phase. Again, animals in both reward probability groups improve accuracy and collected rewards over time, with no differences by group or sex. Notably, there was significant phase difference on choice of the better option, with the discrimination > reversal phase.
Figure 4
Figure 4. Patterns of Latencies by Sex and Reward Probability Group During Discrimination and Reversal Learning
Note. (A) There were no group or sex differences in initiation omissions in discrimination. (B) There were no group or sex differences in better choice latencies in discrimination. (C) There were significant probability group differences in reward collection latencies in discrimination, with the 90–30 reward probability group exhibiting longer latencies than the 70–30 reward probability group. (D) There were no group or sex differences in initiation omissions in reversal. (E) There were sex differences in better choice latencies in the reversal phase, with females taking longer to make a choice of the better option than males (with and without controlling for the number of discrimination sessions to criterion). (F) There were significant probability group differences in reward collection latencies in the reversal phase, with the 90–30 reward probability group exhibiting longer latencies than the 70–30 reward probability group (with and without controlling for the number of discrimination sessions to criterion). Bars indicate ±SEM. # p = .07. * p ≤ .05. *** p ≤ .001.
Figure 5
Figure 5. Greater Overall Win-Stay and Win-Stay on the Better Option in the 90–30 Group During Discrimination
Note. (A–C) Plotted are proportion of win-stay responses overall (A), after choosing the better option (B), and after choosing the worse option (C) during discrimination. Overall win-stay and win-stay on the better option is used more often in the 90–30 group than the 70–30 group. (D–f) The same as (A)–(C) but for reversal. We find no significant effects on overall win-stay and win-stay on the better option, but do find females are more likely to apply a win-stay strategy after choosing the worse option than males. Bars indicate ±SEM. * p ≤ .05.
Figure 6
Figure 6. Greater Perseveration During Reversal and Lower Repetition Index Measures for Males as Compared to Females in Both Phases, and for the 70–30 Group as Compared to the 90–30 Group in the Discrimination Phase
Note. Plotted is the perseveration index (A), overall repetition index (B), repetition index for the better option (C), and repetition index for the worse option (D) in the discrimination phase. Though we find no significant differences in perseveration index, the 70–30 group shows a significantly lower RI and RIB, and marginally significantly lower RIW. Additionally, males have significantly lower values for all three repetition index measures. (E–H) Same as (A)–(D), but for reversal. We observe a significantly lower perseveration index and a marginally significant lower RI, RIB, and RIW in males than females. Bars indicate ± SEM. # p = .06. * p ≤ .05.
Figure 7
Figure 7. Higher Learning Rate and Lower Sensitivity to Difference in Subjective Reward Values in Reversal Compared to Discrimination Learning
Note. (A and B) Learning parameters and sensitivity to difference in reward values for the single-learning rate model during the discrimination (A) and reversal (B) phases. We find no significant difference in parameter values between reward probability groups during discrimination or reversal. However, we do find significantly higher learning rates and significantly lower sensitivity to difference in reward values parameters following reversal.

Similar articles

Cited by

References

    1. Aarde SM, Genner RM, Hrncir H, Arnold AP, & Jentsch JD (2020). Sex chromosome complement affects multiple aspects of reversal-learning task performance in mice. Genes, Brain and Behavior, 20(1). Article e12685. 10.1111/gbb.12685 - DOI - PubMed
    1. Aguirre CG, Stolyarova A, Das K, Kolli S, Marty V, Ray L, Spigelman I, & Izquierdo A (2020). Sex-dependent effects of chronic intermittent voluntary alcohol consumption on attentional, not motivational, measures during probabilistic learning and reversal. PLoS One, 15(6). Article e0234729. 10.1371/journal.pone.0234729 - DOI - PMC - PubMed
    1. Alvarez P, & Eichenbaum H (2002). Representations of odors in the rat orbitofrontal cortex change during and after learning. Behavioral Neuroscience, 116(3), 421–433. https://www.ncbi.nlm.nih.gov/pubmed/12049323 - PubMed
    1. Amitai N, & Markou A (2010). Disruption of performance in the five-choice serial reaction time task induced by administration of N-methyl-d-aspartate receptor antagonists: Relevance to cognitive dysfunction in schizophrenia. Biological Psychiatry, 68(1), 5–16. 10.1016/j.biopsych.2010.03.004 - DOI - PMC - PubMed
    1. Amsel A (Ed.). (1967). Partial reinforcement effects in vigor and persistence: Advances in frustration theory derived from a variety of within-subjects experiments (Vol. 1). Academic Press.