Habits, action sequences and reinforcement learning

Eur J Neurosci. 2012 Apr;35(7):1036-51. doi: 10.1111/j.1460-9568.2012.08050.x.


It is now widely accepted that instrumental actions can be either goal-directed or habitual; whereas the former are rapidly acquired and regulated by their outcome, the latter are reflexive, elicited by antecedent stimuli rather than their consequences. Model-based reinforcement learning (RL) provides an elegant description of goal-directed action. Through exposure to states, actions and rewards, the agent rapidly constructs a model of the world and can choose an appropriate action based on quite abstract changes in environmental and evaluative demands. This model is powerful but has a problem explaining the development of habitual actions. To account for habits, theorists have argued that another action controller is required, called model-free RL, that does not form a model of the world but rather caches action values within states allowing a state to select an action based on its reward history rather than its consequences. Nevertheless, there are persistent problems with important predictions from the model; most notably the failure of model-free RL correctly to predict the insensitivity of habitual actions to changes in the action-reward contingency. Here, we suggest that introducing model-free RL in instrumental conditioning is unnecessary, and demonstrate that reconceptualizing habits as action sequences allows model-based RL to be applied to both goal-directed and habitual actions in a manner consistent with what real animals do. This approach has significant implications for the way habits are currently investigated and generates new experimental predictions.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Animals
  • Goals*
  • Habits*
  • Humans
  • Learning / physiology*
  • Random Allocation
  • Reinforcement, Psychology*