Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 17;41(11):2406-2419.
doi: 10.1523/JNEUROSCI.2588-20.2021. Epub 2021 Feb 2.

Coordinated Prefrontal State Transition Leads Extinction of Reward-Seeking Behaviors

Affiliations

Coordinated Prefrontal State Transition Leads Extinction of Reward-Seeking Behaviors

Eleonora Russo et al. J Neurosci. .

Abstract

Extinction learning suppresses conditioned reward responses and is thus fundamental to adapt to changing environmental demands and to control excessive reward seeking. The medial prefrontal cortex (mPFC) monitors and controls conditioned reward responses. Abrupt transitions in mPFC activity anticipate changes in conditioned responses to altered contingencies. It remains, however, unknown whether such transitions are driven by the extinction of old behavioral strategies or by the acquisition of new competing ones. Using in vivo multiple single-unit recordings of mPFC in male rats, we studied the relationship between single-unit and population dynamics during extinction learning, using alcohol as a positive reinforcer in an operant conditioning paradigm. To examine the fine temporal relation between neural activity and behavior, we developed a novel behavioral model that allowed us to identify the number, onset, and duration of extinction-learning episodes in the behavior of each animal. We found that single-unit responses to conditioned stimuli changed even under stable experimental conditions and behavior. However, when behavioral responses to task contingencies had to be updated, unit-specific modulations became coordinated across the whole population, pushing the network into a new stable attractor state. Thus, extinction learning is not associated with suppressed mPFC responses to conditioned stimuli, but is anticipated by single-unit coordination into population-wide transitions of the internal state of the animal.SIGNIFICANCE STATEMENT The ability to suppress conditioned behaviors when no longer beneficial is fundamental for the survival of any organism. While pharmacological and optogenetic interventions have shown a critical involvement of the mPFC in the suppression of conditioned responses, the neural dynamics underlying such a process are still largely unknown. Combining novel analysis tools to describe behavior, single-neuron response, and population activity, we found that widespread changes in neuronal firing temporally coordinate across the whole mPFC population in anticipation of behavioral extinction. This coordination leads to a global transition in the internal state of the network, driving extinction of conditioned behavior.

Keywords: alcohol; attractor states; behavioral model; change-point analysis; extinction learning; prelimbic cortex.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Behavioral paradigm and recording sites. A, B, Behavioral task (A) and schematics of trial timeline during reinforced trials (B). During reinforced trials, reward was delivered exclusively on pressing the cued lever (active lever). C, Histologically verified recording sites within PL of the 10 rats.
Figure 2.
Figure 2.
Recording stability. A, Absolute value of the average spike amplitude of the first and last 100 spikes of the session for each unit recorded in the maintenance and extinction sessions. The agglomeration of points along the diagonal rules out drifts in the amplitude of the recorded spikes during the session. B, Histogram of the ratio between spike amplitude and the spike detection threshold, showing that spike amplitudes are at least two times higher than the detection threshold both at the beginning and at the end of the recording session. Ratios were computed over the average of both the first and last 100 spikes of the session.
Figure 3.
Figure 3.
PL activity remains modulated by conditioned cues during extinction learning. A, B, Percentage of active lever presses during maintenance and the last 18 trials of extinction (A) and throughout within-session extinction (B). Dashed lines show percentages for individual animals. Solid line and error bars show the mean ± SEM. Asterisk and hash symbols mark Benjamini–Hochberg-corrected p < 0.05 and p < 0.08, respectively. C, The z-scored activity of significantly responding units (number of units shown for each curve) following cue light and lever presentation (see Materials and Methods). Horizontal dotted lines mark the significance threshold and testing window. Solid lines and shading show the mean ± SEM. D, AUC for z-scored single-unit response (see Materials and Methods) of all units computed on trial blocks of steady-state behavior (early/late: first/last 12 trials during maintenance and extinction; reinforced: 9 reinforced trials during within-session extinction). Boxplot whiskers extend to include points within 1.5 of the IQR. Horizontal dotted lines mark the significance threshold.
Figure 4.
Figure 4.
Whole-trial PL population activity reflects behavioral changes during extinction learning. A, Examples of the behavioral models (orange) of four representative animals and their respective population CPs computed over the whole-trial firing rate of the population during within-session extinction (light blue). Filled black circles indicate the trial-specific behavioral choice. Dashed line indicates the onset of extinction trials. Numbers at the top right of each panel indicate the number of recorded units. B, Distribution of likelihood ratio test statistic for relating the set of behavioral response models during within-session extinction to maintenance population CPs (λmaint; left) and extinction population CPs (λext; right). Points in magenta and cyan correspond to λext values from two animals, which were consistently close to or <0, indicating a poor match between population CP and the behavior of the two. Boxplot whiskers extend to include points within 1.5 of the IQR. C, Number of units recorded from each rat against its corresponding likelihood ratio test statistic values λext. Points in magenta and cyan pertain to λext values from the corresponding rats in B.
Figure 5.
Figure 5.
PL single-unit dynamics during extinction learning is indistinguishable from that during maintenance. A, Four examples (same animal) of single-unit whole-trial firing rates (black dots) during within-session extinction, with single-unit CPs (light blue filled circles) and the firing rate as inferred by the CP detection algorithm (light blue solid line). Behavioral model shown in orange as in Figure 4A. B, Five task windows of interest within which five sets of population and single-unit CPs were identified from population and single-unit firing rates. Windows are defined relative to light onset as follows: ITI, seconds -3 to -1; cue light, seconds 0 to 0.5; delay period, seconds 3 to 5; lever presentation, seconds 5 to 5.5; and whole trial, seconds 0 to 15. C, Distribution of single-unit CPs across maintenance and extinction trials (60 and 69 trials, respectively) for each task window, pooled from all animals. D–F, Number of single-unit CPs per unit (D), relative change in firing rate (E), and positive-to-negative sign ratio (F) computed in four task windows. Plots show the mean ± 1.96 SEM (red and gray) and SD (blue or orange). Open circles indicate the mean for individual animals. The three quantities (D–F) are statistically indistinguishable when compared between sessions. Gray line in F marks a sign ratio of 1, where positive and negative rate changes are balanced. G, Population firing rate per task window over blocks of six consecutive trials during within-session extinction (trials 1–3 excluded; Fig. 3B). Dashed line indicates the onset of extinction trials. Solid lines and error bars show the mean ± SEM. H, Sensitivity analysis showing increased and decreased single-unit whole-trial firing rates of a representative animal during the first and last 12 trials of maintenance (left) and extinction (middle). Empirical distribution functions (right) of the sensitivity index d for all recorded single units from all animals in maintenance (blue) and extinction (orange) show no significant difference, despite difference in behavior. Dotted lines mark the threshold of significant change in firing rate. I, P-P plot comparing the empirical distribution function of single-unit CPs over maintenance and within-session extinction trials (compare C) for the four task windows.
Figure 6.
Figure 6.
PL baseline rate and task-evoked responses change in anticipation of behavioral extinction. A, The z-scored whole-trial response of all recorded units from one representative animal during maintenance (left) and within-session extinction (right), overlayed with population CPs (blue dashed lines) and single-unit CPs (blue triangles). Triangle directions indicate whether the CP results from an increase or decrease in the firing rate of the corresponding unit. The z scores are shown with the same scale in both sessions. Dashed white line indicates the onset of extinction trials. B, Number of population CPs per animal. Plots show the mean ± 1.96 SEM (red and gray) and SD (blue or orange). Open circles indicate the numbers for individual animals. C, Onset (yellow) and center (red) of an extinction-learning episode for one representative animal. Behavioral CP10% and behavioral CP50% correspond to 10% and 50% drops in response probability, respectively. D, E, Single-unit CP distributions for different task windows (whole trial, cue light, delay period, and lever presentation) pooled across animals and aligned with respect to behavioral CP10% (D) and CP50% (E) from the within-session extinction of each animal. Single-unit CPs of the extinction session coordinated at extinction onset in all windows (top), while those of the maintenance session showed no significant coordination when aligned to the extinction onset of the extinction session (bottom). Statistical tests performed via bootstrap (see Materials and Methods). The p values assigned to each trial lag (center of the bin) are reported on logarithmic scale for visibility. The Benjamini–Hochberg correction for multiple comparisons was performed only on the p values of the seven bins of the displayed histogram. Asterisks mark p < 0.05 (black) and p < 0.1 (gray) after correction. Horizontal dotted lines mark the log(0.05) threshold over the tested window. F, G, same as D and E, respectively, on single-unit CPs computed on the ITI window.
Figure 7.
Figure 7.
Reorganization in PL activity is predictive of behavioral extinction in all task windows. A, B, Classifier performance in predicting the behavioral state of the animal defined with respect to behavioral CP10% (A) and CP50% (B) values from population firing rates during four task windows (Fig. 5B). Significance was assessed via bootstrap (see Materials and Methods). Differences between data and bootstrapped Cohen's κ are reported by showing the mean ± 1.96 SEM (red and gray) and SD (purple). Open circles indicate differences for individual animals. Population rates are predictive of extinction onset in the ITI, delay-period, and lever-presentation windows. C, To the left, raster plots of two representative units from the same animal, showing rate progression across extinction (bottom to top). Filled and open blue circles mark trials with reinforced and unreinforced lever presses, respectively. Vertical gray lines indicate cue-light onset and lever presentation. Single-unit firing rates based on CP detection in four task windows are color coded as in Figure 5B. To the right, average spike waveform for the first (black) and last (red) 100 spikes of the session, confirming that the observed rate changes could not be ascribed to recording artifacts (Fig. 2). D, Fraction of single units for which the evolution of firing rates within one window significantly correlates with that within a second window. E, Firing-rate changes during ITI are most coordinated with those occurring during the delay period. Absolute distance in trials between the occurrence of a single-unit CP in ITI and the closest single-unit CP of the same unit in the cue-light, delay-period, and lever-presentation windows. Plots show the mean ± 1.96 SEM (red and gray) and SD (purple). Open circles mark values for individual animals.

Similar articles

Cited by

References

    1. Babayan BM, Uchida N, Gershman SJ (2018) Belief state representation in the dopamine system. Nat Commun 9:1891. 10.1038/s41467-018-04397-0 - DOI - PMC - PubMed
    1. Bartolo R, Averbeck BB (2020) Prefrontal cortex predicts state switches during reversal learning. Neuron 106:1044–1054.e4. 10.1016/j.neuron.2020.03.024 - DOI - PMC - PubMed
    1. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol 57:289–300. 10.1111/j.2517-6161.1995.tb02031.x - DOI
    1. Brebner LS, Ziminski JJ, Margetts-Smith G, Sieburg MC, Reeve HM, Nowotny T, Hirrlinger J, Heintz TG, Lagnado L, Kato S, Kobayashi K, Ramsey LA, Hall CN, Crombag HS, Koya E (2020) The emergence of a stable neuronal ensemble from a wider pool of activated neurons in the dorsal medial prefrontal cortex during appetitive learning in mice. J Neurosci 40:395–410. 10.1523/JNEUROSCI.1496-19.2019 - DOI - PMC - PubMed
    1. Caballero JP, Scarpa GB, Remage-Healey L, Moorman DE (2019) Differential effects of dorsal and ventral medial prefrontal cortex inactivation during natural reward seeking, extinction, and cue-induced reinstatement. eNeuro 6:ENEURO.0296-19.2019. 10.1523/ENEURO.0296-19.2019 - DOI - PMC - PubMed

Publication types

LinkOut - more resources