Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb 25;35(8):3499-514.
doi: 10.1523/JNEUROSCI.1962-14.2015.

Distinct neural representation in the dorsolateral, dorsomedial, and ventral parts of the striatum during fixed- and free-choice tasks

Affiliations

Distinct neural representation in the dorsolateral, dorsomedial, and ventral parts of the striatum during fixed- and free-choice tasks

Makoto Ito et al. J Neurosci. .

Abstract

The striatum is a major input site of the basal ganglia, which play an essential role in decision making. Previous studies have suggested that subareas of the striatum have distinct roles: the dorsolateral striatum (DLS) functions in habitual action, the dorsomedial striatum (DMS) in goal-directed actions, and the ventral striatum (VS) in motivation. To elucidate distinctive functions of subregions of the striatum in decision making, we systematically investigated information represented by phasically active neurons in DLS, DMS, and VS. Rats performed two types of choice tasks: fixed- and free-choice tasks. In both tasks, rats were required to perform nose poking to either the left or right hole after cue-tone presentation. A food pellet was delivered probabilistically depending on the presented cue and the selected action. The reward probability was fixed in fixed-choice task and varied in a block-wise manner in free-choice task. We found the following: (1) when rats began the tasks, a majority of VS neurons increased their firing rates and information regarding task type and state value was most strongly represented in VS; (2) during action selection, information of action and action values was most strongly represented in DMS; (3) action-command information (action representation before action selection) was stronger in the fixed-choice task than in the free-choice task in both DLS and DMS; and (4) action-command information was strongest in DLS, particularly when the same choice was repeated. We propose a hypothesis of hierarchical reinforcement learning in the basal ganglia to coherently explain these results.

Keywords: action value; basal ganglia; decision making; reinforcement learning; state value; striatum.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Task design. A, Schematic illustration of the experimental chamber. The chamber was equipped with three holes for nose poking (L, left hole; C, center hole; R, right hole) and a pellet dish (D). B, Time sequence of choice tasks. After a rat poked its nose into the center hole, one of three cue tones was presented. The rat had to maintain the nose-poke in the center hole during presentation of the cue tone. After offset of the cue tone, the rat was required to perform a nose-poke in either the left or right hole and then either a reward tone or a no-reward tone was presented. The reward tone was followed by delivery of a sucrose pellet in the food dish. The reward probability was determined by the given cue tone and the chosen action. C, The reward probabilities for cue tones and actions. For the left tone, the reward probabilities were (left, right) = (50%, 0%). For the right tone, the probabilities were (left, right) = (0%, 50%). These probabilities were fixed throughout the experiments. For the choice tone, reward probabilities were varied: one of four pairs of reward probabilities [(left, right) = (90%, 50%), (50%, 90%), (50%, 10%), and (10%, 50%)] were used for each block.
Figure 2.
Figure 2.
Rat performance in fixed-choice and free-choice blocks. A, A representative example of a rat's performance. Blue, red, and orange vertical lines indicate individual choices for left, right, and choice tones, respectively. A sequence of blocks consisted of two single fixed-choice blocks (left- or right-tone trials), two double fixed-choice blocks (mix of left- and right-tone trials), and four free-choice blocks with different reward probabilities (choice-tone trials). Bottom, Blue, red, and orange represent the probability of a left choice for each tone (average of the last 20 trials). When the choice frequency of the action that was associated with higher reward probability reached 80%, the block was changed. “e” indicates an extinction test, consisting of 5 trials without reward delivery. This block sequence was repeated two or three times in one day recording sessions. B, The average left-choice probability during extinction tests (five unrewarded trials for each cue tone) with 95% confidence intervals (shaded bands). Left choice probabilities for left tones, right tones, and choice tones are plotted in blue, red, and orange colors, respectively. Left-choice probabilities for choice tone trials were separated by the optimal action in the previous free-choice block (the upper graph for left, the lower graph for right). C, Averages of left-choice probabilities over five extinction trials for left tone (blue), choice tone (orange), and right tone (red) with 95% confidence intervals. Top and bottom orange plots represent the average of the upper and lower orange graphs, respectively in B. ***p < 0.0001 (χ2 test). D, The decision tree for choice tones, the left choice probability for all possible experiences in one and two previous trials. Four types of experiences in one trial [left or right times rewarded (1) or no reward (0)] are represented by different colors and line types. For instance, left probability after L1 is indicated by the right edge of a blue solid line (green arrow), and left probability after L1 R0 (L1 and then R0) is indicated by the right edge of a red broken line connected to the blue solid line (brown arrow). Values of trials = 0 (x-axis) represent the left choice probability for all trials. Shaded bands indicate 95% confidence intervals. E, F, Decision trees for left tones and right tones, respectively. Conditional left choice probabilities for left-tone (E) and right-tone trials (F) in single- and double-fixed blocks are represented in the same manner as in D. G, Accuracy of each model in predicting rat choices. Prediction accuracy was defined by the normalized likelihood of test data. Free parameters of each model were determined by maximization of the likelihood of training data. Markov d stands for dth Markov model, a standard prediction model from the past d trials. Q, FQ, and DFQ indicate variations of reinforcement learning models. Numbers followed by the name of models indicate the numbers of free parameters of each model. “const” means that the parameters (α1, α2, κ1, and κ2) were assumed to be constant for all sessions, and “variable” means that the parameters were assumed to vary. **p < 0.01, significant difference from the prediction accuracy of FQ-learning (variable) (paired-sample Wilcoxon's signed rank tests). *p < 0.05, significant difference from the prediction accuracy of FQ-learning (variable) (paired-sample Wilcoxon's signed rank tests). H, An example of predictions of rat choices based on the FQ-model with time-varying parameters. Top, Green line indicates PL(t) = L), the probability that a rat would select left at trial t, estimated from the rat's past experiences e(1), e(2), …, e(t−1). Vertical line indicates the rat's actual choice in each trial. Top lines and bottom lines indicate left and right choices, respectively. Black and gray represent reward and no-reward trials, respectively. Middle, Estimated action values, QL and QR. Bottom, Estimated κ1 and κ2.
Figure 3.
Figure 3.
Representative activity patterns of phasic active neurons in the striatum. A, Tracks of accepted electrode bundles for all rats are indicated by rectangles. Neurons recorded from blue, green, or red rectangles were classified as DLS, DMS, and VS neurons, respectively. Each diagram represents a coronal section referenced to the bregma (Paxinos and Watson, 1998). B, A raster showing spikes of a DLS neuron and corresponding events in free-choice and forced-choice trials, which are aligned with the entry time into the center hole. Bottom, PETH with 10 ms bins for this neuron. C, A corrected raster plot and an event-aligned spike histogram (EASH) with 10 ms bins, derived by linearly scaling time intervals between task events in each trial to the average intervals across all trials. Numbers of spikes between events are preserved. D–I, EASHs for representative neurons from DLS (D, E), DMS (F, G), and VS (H, I). Top, Four different blue and red lines indicate the EASHs from four different pairs of selected actions and reward outcomes. Bottom, Purple and orange lines indicate EASHs for fixed-choice blocks and free-choice blocks, respectively. Black lines indicate averages of EASHs for all trials. All EASHs (10 ms bins) are smoothed by Gaussian kernel with 30 ms SD. D, Same neuron shown in B and C.
Figure 4.
Figure 4.
Activity pattern in the striatum. A, Normalized activity patterns of all recorded PANs from DLS (N = 190), DMS (N = 105), and VS (N = 119). An activity pattern for each neuron was normalized so that the maximum of the EASH was 1 and represented by pseudo-color (values from 0 to 1 are represented from blue to red). Indices of neurons were sorted based on the time that the normalized EASH first surpassed 0.5. Seven trial phases were defined based on task events. B, Preferred trial phases for each subarea. The proportion of neurons, the normalized EASHs of which reached 0.5 during each trial phase. **p < 0.01, *p < 0.05 (χ2 test). C, The averaged activity ratio of striatal neurons for each subarea. The activity ratio (the duration, in which the normalized EASH was >0.5, divided by the duration of the corresponding trial phase) was calculated for each trial phase. Then the proportions of the activity duration were averaged over the trial phases. **p < 0.01, *p < 0.05, (Mann–Whitney U test).
Figure 5.
Figure 5.
Information coding of state, action, and reward. A, State information coded in each neuron. Mutual information between firing rate for each 100 ms time bin of EASH and state (fixed-choice blocks or free-choice blocks) (bit/s) is shown by pseudocolor. Indices of neurons are the same as in Figure 4A. B, Averaged state information for the neurons in each subarea. C, Action information (left or right hole choice), calculated using both fixed- and free-choice blocks. D, Averaged action information over the neurons in each subarea. E, Action information before execution of action (close-up of D). F, Reward information (delivered or not), calculated using both fixed- and free-choice blocks. G, Averaged reward information over the neurons in each subarea. B, D, E, G, Black lines indicate thresholds with significant information (p < 0.01). Shaded bands represent SE.
Figure 6.
Figure 6.
Percentages of neurons coding state, action, and reward. A, Time bins during which neuronal activities were compared for each event. B, C, Percentages of neurons that showed significant selectivity (Mann–Whitney U test, p < 0.01) for each event in fixed-choice blocks (B) and free-choice blocks (C). Action command (AC) and action-coding neurons are defined as neurons that show significantly different firing rates in left- and right-selected trials for 500 ms before or after action execution (offset of center-hole poking), respectively. Reward1 and Reward2 are reward-coding neurons that show reward selectivity for 500 ms before and after offset of L/R poking, respectively. D, Percentages of event-coding neurons detected by whole trials. State-coding neurons are defined as neurons that showed different firing rates in fixed- and free-choice blocks. B–D, All populations are significantly larger than the chance level (binomial test, p < 0.01). **p < 0.01, *p < 0.05, significant differences in percentages between subareas (χ2 test). E, The tendency of preferences of event-coding neurons shown in D. Percentages of neurons that showed higher activity in fixed-choice than in free-choice, in left-selected trials than in right-selected trials, or in rewarded trials than in unrewarded trials, respectively, among each event-coding neuron. **p < 0.01, *p < 0.05, significant difference from 0 (Wilcoxon signed-rank test).
Figure 7.
Figure 7.
Model-based analysis of action value and state value. A, A DLS neuron showing the correlation with the action value for left, QL. EASHs for trials with higher QL and lower QL are shown with green and gray lines, respectively (top left). Blue and red rectangles represent significant neuronal correlations with each variable (p < 0.01, t test) (bottom left). For QL and QR, blue and red colors represent positive and negative correlations, respectively. For action, blue and red represent higher activity in left- and right-selected trials, respectively. For reward, blue and red represent higher activity in rewarded and unrewarded trials. The firing rate in the yellow time bins for each trial (gray lines) was smoothed with a Gaussian filter (black lines) (top right). QL, estimated by the FQ model, is shown with gray lines, and the smoothed one is indicated with black lines (bottom right). B, A DMS neuron showing the negative correlation with QL. C, An action-independent, value-coding (state-value coding) neuron in VS, showing the negative correlation with both QL and QR.
Figure 8.
Figure 8.
The proportion of neurons coding action, previous action, reward, previous reward, action value, and state value. Proportions of neurons showing significant correlation with each variable (p < 0.01, t test) are shown for DLS, DMS, and VS. These neurons were detected by multiple linear regression analysis, which was conducted for 500 ms before and after the seven trial events. Colored disks represent that the populations are significantly higher than the chance level (p < 0.01, binominal test). Colored dots in the upper area indicate significant differences in the proportions between subareas (p < 0.05, Mann–Whitney U test). A, Action-coding neurons. B, Neurons coding action in previous trials. C, Reward-coding neurons. D, Neurons coding rewards in previous trials. E, Action-value-coding neurons. Action-value-coding neurons are defined as neurons showing correlation with either QL or QR. F, Action-independent value-coding (state-value-coding) neurons. These neurons are defined as neurons showing correlation with both QL and QR having the same sign.
Figure 9.
Figure 9.
Action-command information in fixed- and free-choice blocks. Mutual information per seconds between action (upcoming action for Phases 2–4, executing action for Phases 5 and 6), and firing in the last 20 trials of four different blocks. A, Single fixed-choice blocks. B, Double fixed-choice blocks. C, Free-choice blocks with higher reward probabilities [(L = 90%, R = 50%) and (50%, 90%)]. D, Free-choice blocks with lower reward probabilities [(L = 50%, R = 10%) and (10%, 50%)]. The plot of action information starts from a triangle indicating the time when the value surpassed the significance level (p < 0.01). Action information was calculated using a sliding time window of the preceding 500 ms (step size, 50 ms) to clarify after which task event the action command signal increased.
Figure 10.
Figure 10.
Action-command information in DLS, DMS, and VS. A–C, Action information during single fixed-choice blocks (SF), double fixed-choice blocks (DF), the free-choice block with higher reward probabilities (FH), and the free-choice block with lower reward probabilities (FL) averaged for DLS (A), DMS (B), and VS (C). D–F, Action information during free-choice blocks when the action was the same as in the previous trials (stay), or when the action was switched from the previous action (switch). The plot of action information starts from a triangle indicating the time when the value surpassed the significance level (p < 0.01). Action information in this figure was calculated using a sliding time window of the preceding 500 ms (step size, 50 ms) to clarify after which task event the action command signal was increased.

Similar articles

Cited by

References

    1. Atallah HE, Lopez-Paniagua D, Rudy JW, O'Reilly RC. Separate neural substrates for skill learning and performance in the ventral and dorsal striatum. Nat Neurosci. 2007;10:126–131. doi: 10.1038/nn1817. - DOI - PubMed
    1. Balleine BW, O'Doherty JP. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology. 2010;35:48–69. doi: 10.1038/npp.2009.131. - DOI - PMC - PubMed
    1. Balleine BW, Delgado MR, Hikosaka O. The role of the dorsal striatum in reward and decision-making. J Neurosci. 2007;27:8161–8165. doi: 10.1523/JNEUROSCI.1554-07.2007. - DOI - PMC - PubMed
    1. Barnes TD, Kubota Y, Hu D, Jin DZ, Graybiel AM. Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories. Nature. 2005;437:1158–1161. doi: 10.1038/nature04053. - DOI - PubMed
    1. Bayley PJ, Frascino JC, Squire LR. Robust habit learning in the absence of awareness and independent of the medial temporal lobe. Nature. 2005;436:550–553. doi: 10.1038/nature03857. - DOI - PMC - PubMed

Publication types

LinkOut - more resources