Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun 1:6:27056.
doi: 10.1038/srep27056.

Neuronal activity in dorsomedial and dorsolateral striatum under the requirement for temporal credit assignment

Affiliations

Neuronal activity in dorsomedial and dorsolateral striatum under the requirement for temporal credit assignment

Eun Sil Her et al. Sci Rep. .

Abstract

To investigate neural processes underlying temporal credit assignment in the striatum, we recorded neuronal activity in the dorsomedial and dorsolateral striatum (DMS and DLS, respectively) of rats performing a dynamic foraging task in which a choice has to be remembered until its outcome is revealed for correct credit assignment. Choice signals appeared sequentially, initially in the DMS and then in the DLS, and they were combined with action value and reward signals in the DLS when choice outcome was revealed. Unlike in conventional dynamic foraging tasks, neural signals for chosen value were elevated in neither brain structure. These results suggest that dynamics of striatal neural signals related to evaluating choice outcome might differ drastically depending on the requirement for temporal credit assignment. In a behavioral context requiring temporal credit assignment, the DLS, but not the DMS, might be in charge of updating the value of chosen action by integrating choice, action value, and reward signals together.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Animal behavior.
(a) Behavioral task. In each trial, following a 2-s delay at the delay point (D), the animal was required to choose either the left or right target (T) by checking a photobeam sensor (blue dashed lines on top), return to the reward location (R, circle), and wait for 1 s to obtain the water reward. Approximate spatial positions for the divergence (outbound) and convergence (inbound) of left- and right-choice-associated movement trajectories, which were determined separately for each session, are indicated as A (approach) and C (convergence), respectively (red dashed lines). (b) An example of movement trajectory for one session. Blue, left choice; red, right choice. Each dot represents the animal’s head position at 33.3 ms time resolution. (c) Determination of the convergence point. X-coordinates of the animal’s head position data were temporally aligned to the time point 3 s prior to the reward stage onset. The first time point when the difference in X-coordinates of the left- and right-choice trials became statistically insignificant (t-test, p > 0.05) and remained that way for at least nine consecutive points (300 ms) was determined as the convergence point (red vertical line). Top, X-coordinates of all left-choice and right-choice trials in an example session; middle, mean (±SD across trials) X-coordinates of the left-choice and right-choice trials of the same session; bottom, mean X-coordinates of the left-choice and right-choice trials were averaged across sessions (±SD). The data was aligned to the convergence point (time 0) that was determined separately for each session. (d) Choice behavior of one animal in one example session. Tick marks denote trial-by-trial choices of the animal (upper, left choice; lower, right choice; long, rewarded trial; short, unrewarded trial). Vertical lines denote block transitions and numbers on the top indicate mean reward probabilities associated with left and right choices in each block. The black line shows the probability to choose the left target (PL) in a moving average of 10 trials, and the gray line shows PL predicted by the hybrid model.
Figure 2
Figure 2. Recording locations and unit classification.
(a) Single units were recorded from the DMS and the DLS. The diagrams are coronal section views of three rat brains at 0.48 mm anterior to bregma. Each diagram represents one rat and each circle represents one recording site that was determined based on histology and electrode advancement history. One to six units were recorded simultaneously from each site. Modified from ref. with permission from Elsevier. (b) Unit classification. Recorded units were classified into putative MSNs and putative interneurons based on mean discharge rate and spike width. Those units with mean firing rates <6 Hz and spike widths ≥0.24 ms were classified as putative MSNs, and the rest were classified as putative interneurons.
Figure 3
Figure 3. Temporal profiles of neuronal activity.
Mean discharge rates of putative MSNs were compared between the DMS (red) and DLS (blue). The graphs show z-scores of discharge rates of putative MSNs across different behavioral stages (delay, approach to target, target selection, memory, and reward). A spike density function was constructed for each neuron by applying a Gaussian kernel (σ = 100 ms) and then z-normalized based on the mean and SD of discharge rate in 10-ms time bins. Shading indicates 95% confidence interval. Each solid vertical line indicates the beginning of a given behavioral stage. Dashed vertical lines denote delay stage offset (left) and the animal’s arrival at the reward location (right).
Figure 4
Figure 4. Neural activity related to the animal’s choice and its outcome.
(a) Shown are fractions of DMS and DLS MSNs that significantly modulated their activity according to the animal’s choice (C), its outcome (R), and their interaction (X) in the current (t) and previous (t − 1) trials in a 500-ms time window that was advanced in 100-ms time steps across different behavioral stages. Shading indicates chance level (binomial test, alpha = 0.05) for the DLS (7.86%), which is slightly higher than that for the DMS (7.55%). Large open circles indicate significantly different fractions (χ2-test, p < 0.05) between DMS and DLS. Behavioral stages and vertical lines are as shown in Fig. 3. (b) Examples of MSNs responsive to animal’s choice (C) or its outcome (R) in the current trial (t). Top, spike raster plots. Each row is one trial, and each dot represents a spike. Bottom, spike density functions. Trials were divided into two groups according to the animal’s target choice (left vs. right) or reward (rewarded vs. unrewarded).
Figure 5
Figure 5. Temporal profiles of value-related neural activity.
(a) Shown are fractions of DMS and DLS MSNs that significantly modulated their activity according to left action value (QL), right action value (QR), and chosen value (QC). Analysis time windows and shading are as shown in Fig. 4a. (b) Two examples of DLS MSNs coding action value. Top, spike raster plots. Bottom, spike density functions. Trials were divided into four groups according to the level of QL (left) or QR (right).

Similar articles

Cited by

References

    1. Sutton R. S. & Barto A. G. Reinforcement Learning: An Introduction (MIT Press, Cambridge, 1998).
    1. Schultz W. Multiple reward signals in the brain. Nat. Rev. Neurosci. 1, 199–207 (2000). - PubMed
    1. Maren S. Neurobiology of Pavlovian fear conditioning. Annu. Rev. Neurosci. 24, 897–931 (2001). - PubMed
    1. Doya K. Modulators of decision making. Nat. Neurosci. 11, 410–416 (2008). - PubMed
    1. Rangel A., Camerer C. & Montague P. R. A framework for studying the neurobiology of value-based decision making. Nat. Rev. Neurosci. 9, 545–556 (2008). - PMC - PubMed

Publication types

LinkOut - more resources