Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Dec 11:10:2735.
doi: 10.3389/fpsyg.2019.02735. eCollection 2019.

Hierarchical Action Control: Adaptive Collaboration Between Actions and Habits

Affiliations
Review

Hierarchical Action Control: Adaptive Collaboration Between Actions and Habits

Bernard W Balleine et al. Front Psychol. .

Abstract

It is now commonly accepted that instrumental actions can reflect goal-directed control; i.e., they can show sensitivity to changes in the relationship to and the value of their consequences. With overtraining, stress, neurodegeneration, psychiatric conditions, or after exposure to various drugs of abuse, goal-directed control declines and instrumental actions are performed independently of their consequences. Although this latter insensitivity has been argued to reflect the development of habitual control, the lack of a positive definition of habits has rendered this conclusion controversial. Here we consider various alternative definitions of habit, including recent suggestions they reflect chunked action sequences, to derive criteria with which to categorize responses as habitual. We consider various theories regarding the interaction between goal-directed and habitual controllers and propose a collaborative model based on their hierarchical integration. We argue that this model is consistent with the available data, can be instantiated both at an associative level and computationally and generates interesting predictions regarding the influence of this collaborative integration on behavior.

Keywords: action sequences; chunking; goal-directed action; habits; model-based; model-free; reinforcement learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Competition and collaboration in goal-directed and habitual action control. (A) Simple model of competition for performance with goal-directed and habitual controllers mutually inhibiting one another. (B) More sophisticated approach to competition, with goal-directed and habitual controllers competing through arbitration. (C) Behavioral evidence suggests, in contrast to competition, that habit and goal-directed processes are intimately connected and collaborate in action selection, evaluation, and execution. (D) A formal associative architecture that instantiates the collaboration between habit and goal-directed controllers through the interaction of habit memory and associative memory systems, the latter feeding back to control performance. Action selection in the habit memory is mediated by the association of S1 and R1 that feeds forward to provide both subthreshold activation of the motor output and activation of the action representation, A1, in the associative memory provoking retrieve of the action outcome (O1) and its evaluation through the interaction of the associative and evaluative memory systems. The latter provides a promiscuous, feedback (cybernetic) signal that sums with the forward excitation from the habit memory. If positively evaluated (blue lines/arrows), it provokes action execution; if negatively evaluated (red lines/arrows), it blocks performance. (E) An example of the representation of a complex habit sequence in the habit memory incorporating lever press and magazine approach responses together with a simple lever press action. Both are represented in the habit memory (the expanded sequence, the acquisition of which is supported by proprioceptive feedback from motor output) and its chunked representation in the associative memory (e.g., ALO-MA). (F) The formal associative-cybernetic model incorporating chunked action sequences and simple actions in both the habit memory and the associative memory.
Figure 2
Figure 2
Evidence for hierarchical collaboration in humans and rats. (A) Two-stage task in human subjects. (B) After a rare transition (example shown) and revaluation of O2 (upper panel), an expanded action repertoire using action sequences (e.g., A1R1) can induce insensitivity to revaluation of the second stage choice (e.g., R1). (C) The influence of reward and non-reward on the tendency to stay on the same first stage choice after a common and a rare transition in human subjects. (D) Simulated (sim) second stage choices from various flat model-based and/or model-free RL models (left panel), a hierarchical RL model (center), and the human data (right panel). (E) Design of a two-stage task in rats with training conducted on a two-stage discrimination that is reversed, initially, every four trials and subsequently every eight trials. At various points in training, we included rare transitions as probe tests (sessions 40, 66, 78, 87, and 94). (F) The odds ratio of staying on the same stage 1 action after reward on the previous trial over the odds ratio after no reward. The horizontal line represents the indifference point. Each vertical line is one session. (G) Results from the probe tests. Note the comparable performance of rats and humans when rats show evidence of having acquired an accurate representation of the multistage nature of the task. (H) Rat data from second stage choices using a comparable version of the task to that used in humans. Panels (A–D,G,H) are taken directly from Dezfouli and Balleine (2013, . Panels (E,F) are redrawn from Dezfouli and Balleine (2019).

Similar articles

Cited by

References

    1. Adams C. D. (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. Q. J. Exp. Psychol. Sect. B 34, 77–98. 10.1080/14640748208400878 - DOI
    1. Akam T., Costa R., Dayan P. (2015). Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLoS Comput. Biol. 11:e1004648. 10.1371/journal.pcbi.1004648, PMID: - DOI - PMC - PubMed
    1. Balleine B. W. (2005). Neural bases of food-seeking: affect, arousal and reward in corticostriatolimbic circuits. Physiol. Behav. 86, 717–730. 10.1016/j.physbeh.2005.08.061, PMID: - DOI - PubMed
    1. Balleine B. W., Delgado M. R., Hikosaka O. (2007). The role of the dorsal striatum in reward and decision-making. J. Neurosci. 27, 8161–8165. 10.1523/JNEUROSCI.1554-07.2007, PMID: - DOI - PMC - PubMed
    1. Balleine B. W., Dickinson A. (1998). Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419. 10.1016/S0028-3908(98)00033-1, PMID: - DOI - PubMed

LinkOut - more resources