Parallel Representation of Value-Based and Finite State-Based Strategies in the Ventral and Dorsal Striatum

PLoS Comput Biol. 2015 Nov 3;11(11):e1004540. doi: 10.1371/journal.pcbi.1004540. eCollection 2015 Nov.


Previous theoretical studies of animal and human behavioral learning have focused on the dichotomy of the value-based strategy using action value functions to predict rewards and the model-based strategy using internal models to predict environmental states. However, animals and humans often take simple procedural behaviors, such as the "win-stay, lose-switch" strategy without explicit prediction of rewards or states. Here we consider another strategy, the finite state-based strategy, in which a subject selects an action depending on its discrete internal state and updates the state depending on the action chosen and the reward outcome. By analyzing choice behavior of rats in a free-choice task, we found that the finite state-based strategy fitted their behavioral choices more accurately than value-based and model-based strategies did. When fitted models were run autonomously with the same task, only the finite state-based strategy could reproduce the key feature of choice sequences. Analyses of neural activity recorded from the dorsolateral striatum (DLS), the dorsomedial striatum (DMS), and the ventral striatum (VS) identified significant fractions of neurons in all three subareas for which activities were correlated with individual states of the finite state-based strategy. The signal of internal states at the time of choice was found in DMS, and for clusters of states was found in VS. In addition, action values and state values of the value-based strategy were encoded in DMS and VS, respectively. These results suggest that both the value-based strategy and the finite state-based strategy are implemented in the striatum.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Choice Behavior / physiology*
  • Computational Biology
  • Corpus Striatum / physiology*
  • Learning / physiology*
  • Male
  • Markov Chains
  • Models, Neurological
  • Neurons / physiology*
  • Rats
  • Rats, Long-Evans

Grants and funding

This work was supported by MEXT KAKENHI Grant Number 23120007(KD), MEXT KAKENHI Grant Number 26120729,(MI) and JSPS KAKENHI Grant Number 25430017(MI). KAKENHI: these grants cover a full range of creative and pioneering research from basic to applied fields across the humanities, social sciences and natural sciences (MEXT KAKENHI; JSPS KAKENHI: The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.