Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar;126(2):292-311.
doi: 10.1037/rev0000120. Epub 2019 Jan 24.

Habits Without Values

Affiliations
Free PMC article

Habits Without Values

Kevin J Miller et al. Psychol Rev. .
Free PMC article

Abstract

Habits form a crucial component of behavior. In recent years, key computational models have conceptualized habits as arising from model-free reinforcement learning mechanisms, which typically select between available actions based on the future value expected to result from each. Traditionally, however, habits have been understood as behaviors that can be triggered directly by a stimulus, without requiring the animal to evaluate expected outcomes. Here, we develop a computational model instantiating this traditional view, in which habits develop through the direct strengthening of recently taken actions rather than through the encoding of outcomes. We demonstrate that this model accounts for key behavioral manifestations of habits, including insensitivity to outcome devaluation and contingency degradation, as well as the effects of reinforcement schedule on the rate of habit formation. The model also explains the prevalent observation of perseveration in repeated-choice tasks as an additional behavioral manifestation of the habit system. We suggest that mapping habitual behaviors onto value-free mechanisms provides a parsimonious account of existing behavioral and neural data. This mapping may provide a new foundation for building robust and comprehensive models of the interaction of habits with other, more goal-directed types of behaviors and help to better guide research into the neural mechanisms underlying control of instrumental behavior more generally. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

Figures

Figure 1.
Figure 1.
Left: Traditional view of the relationship between habits and goal-directed control. Habits are viewed as stimulus-response associations that become stronger with use, while goal-directed control takes into account knowledge of action-outcome relationships as well as current goals in order to guide choice. Right: Common computational view. Habits are implemented by a model-free RL agent which learns a value function over states and actions, while goal-directed control is implemented by a model-based RL agent which learns about the structure of the environment.
Figure 2:
Figure 2:
Schematic description of the model components and their interactions. See main text for details.
Figure 3.
Figure 3.
A) Simulations of a reversal-learning environment: Action A is initially reinforced with higher probability (0.5) than Action B (0), but after 1000 trials, the relative dominance of the actions reverses. B) Soon after the reversal, the goal-directed system learns that Action B is more valuable. C) The habit system increasingly favors Action A the more often it is chosen and only begins to favor Action B once that action is chosen more consistently (long after reversal). D) The weight of the goal-directed controller gradually decreases as habits strengthen, then increases post-reversal as the global and goal-directed reinforcement rates diverge. E) Actions are selected on each trial by a weighted combination of the goal-directed values (Q) and the habit strengths (H) according to the weight (w).
Figure 4.
Figure 4.. Behavior becomes inflexible after overtraining.
Rate of pressing in a simulated instrumental conditioning task at the end of the training period (blue) as well as following omission or devaluation manipulations (orange), as a function of the duration of the training period. As this duration increases, the agent is increasingly unlikely to alter its behavior (blue and orange curves become similar). These simulations are consistent with the finding that overtraining results in behavior that is insensitive to omission and to devaluation. Error bars represent standard errors over ten simulations.
Figure 5.
Figure 5.
Variable-Interval (VI) schedules produce more rapid habit formation than Variable-Ratio (VR) schedules. Top: Cross-sections of the state of the agent acquiring lever pressing on a VI (left) or VR (right) schedule, taken 5,000 trials into training. Solid curves indicate the rate of pellets or effort as a function of the rate of pressing. Note that in the VR schedule, pellet rate is linear in press rate, whereas in the VI schedule, the relationship is sublinear. Dashed red and green curves indicate the goal-directed system’s estimates of these quantities (R). The dashed orange curve indicates the habit strength (H) associated with each press rate. Bars give a histogram of the responses of the agent between time points 4,000 and 5,000. Bottom: Time courses of key model variables over the course of training.
Figure 6.
Figure 6.. Model Reproduces Effects of Lesions on Behavioral Flexibility.
Rate of lever pressing before (blue) and after (orange) omission (top) or devaluation manipulations (bottom rows) performed following either limited or extensive training (left and right columns). We simulated lesions by impairing the goal-directed or habitual controllers, respectively (see Methods for details). The unlesioned model responded flexibly to both manipulations following limited, but not extensive training. Goal-directed lesions caused the model to acquire lever pressing at a much lower rate, and rendered it inflexible to all manipulations, a pattern seen in rats with DMS lesions (Yin et al., 2005). Habit lesions caused the model to respond flexibly to all manipulations, a patterns seen in rats with DLS lesions (Yin et al., 2004, 2006).
Figure 7.
Figure 7.
Left/middle: Rats performing a sequential choice task exhibit both reinforcer-seeking behavior (left) as well as repetition of recently chosen actions (middle), as has been observed in other species. Reinforcement and choice sensitivity are shown as a function of trial lag for one example rat (Example taken from Miller et al., in prep.). Right: To compare the ability of our model and a MB/MF agent to capture key tendencies in these data, we show total reinforcement and choice sensitivity (summing over trial lags shown in left/middle panels) for these rats (green; mean and standard deviation) as well as for simulated model-based/perseverative agents and model-based/model-free agents. Overall the rats exhibit similar choice and reinforcement sensitivity on average. Our model is able to capture this with a relatively limited parameter range (blue scatter; see Table Two); across a much broader parameter range, however, we find that MB/MF agents are unable to generate this same pattern of behavior (red scatter).

Similar articles

See all similar articles

Cited by 11 articles

See all "Cited by" articles

Publication types

Feedback