Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 30;115(44):E10467-E10475.
doi: 10.1073/pnas.1803839115. Epub 2018 Oct 12.

Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization

Affiliations

Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization

Nicolas Y Masse et al. Proc Natl Acad Sci U S A. .

Abstract

Humans and most animals can learn new tasks without forgetting old ones. However, training artificial neural networks (ANNs) on new tasks typically causes them to forget previously learned tasks. This phenomenon is the result of "catastrophic forgetting," in which training an ANN disrupts connection weights that were important for solving previous tasks, degrading task performance. Several recent studies have proposed methods to stabilize connection weights of ANNs that are deemed most important for solving a task, which helps alleviate catastrophic forgetting. Here, drawing inspiration from algorithms that are believed to be implemented in vivo, we propose a complementary method: adding a context-dependent gating signal, such that only sparse, mostly nonoverlapping patterns of units are active for any one task. This method is easy to implement, requires little computational overhead, and allows ANNs to maintain high performance across large numbers of sequentially presented tasks, particularly when combined with weight stabilization. We show that this method works for both feedforward and recurrent network architectures, trained using either supervised or reinforcement-based learning. This suggests that using multiple, complementary methods, akin to what is believed to occur in the brain, can be a highly effective strategy to support continual learning.

Keywords: artificial intelligence; catastrophic forgetting; context-dependent gating; continual learning; synaptic stabilization.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Network architectures for the permuted MNIST task. The ReLu activation function was applied to all hidden units. (A) The baseline network consisted of a multilayer perceptron with two hidden layers consisting of 2,000 units each. (B) For some networks, a context signal indicating task identity projected onto the two hidden layers. The weights between the context signal and the hidden layers were trainable. (C) Split networks (net.) consisted of five independent subnetworks, with no connections between subnetworks. Each subnetwork consisted of two hidden layers with 733 units each, such that it contained the same amount of parameters as the full network described in A. Each subnetwork was trained and tested on 20% of tasks, implying that, for every task, four of the five subnetworks was set to zero (gated). A context signal, as described in B, projected onto the two hidden layers. (D) XdG consisted of multiplying the activity of a fixed percentage of hidden units by 0 (gated), while the rest were left unchanged (not gated). The results in Fig. 2D involve gating 80% of hidden units.
Fig. 2.
Fig. 2.
Task accuracy on the permuted MNIST benchmark task. All curves show the mean classification accuracy as a function of the number of tasks the network was trained on, where each task corresponds to a different random permutation of the input pixels. (A) The green and magenta curves represent the mean accuracy for networks with EWC and with synaptic intelligence, respectively. (B) The solid green and magenta curves represent the mean accuracy for networks with EWC and with synaptic intelligence, respectively (same as in A), and the dashed green and dashed magenta curves represent the mean accuracy for networks with a context signal combined with EWC or synaptic intelligence, respectively. (C) The dashed green and magenta curves represent the mean accuracy for networks with a context signal combined with EWC or synaptic intelligence, respectively (same as in B), and the solid green and magenta curves represent the mean accuracy for split networks with a context signal combined with EWC or synaptic intelligence, respectively. (D) The solid green and magenta curves represent the mean accuracy for networks with EWC or synaptic intelligence, respectively (same as in A), the black curve represents the mean accuracy of networks with XdG used alone, and the dashed green and magenta curves represent the mean accuracy for networks with XdG combined with EWC or synaptic intelligence, respectively. SI, synaptic intelligence.
Fig. 3.
Fig. 3.
Analyzing the interaction between XdG and synaptic stabilization. (A) The effect of perturbing synapses of various importance is shown for a network with synaptic intelligence that was sequentially trained on 100 MNIST permutations. Each dot represents the change in mean accuracy (y axis) after perturbing a single synapse, whose importance is indicated on the x axis. For visual clarity, we show the results from 1,000 randomly selected synapses chosen from the connection weights to the output layer. (B) Scatter plot showing the Euclidean distance in synaptic values measured before and after training on each MNIST permutation (x axis) vs. the accuracy the network achieved on each new permutation. The task number in the sequence of 100 MNIST permutations is indicated by the red to blue color progression. (C, Left) Histogram of synaptic importances from the connections between the input layer and the first hidden layer (layer 1), for networks with XdG (green curve) and without (magenta curve). (C, Right) Synaptic distance, measured before and after training on the 100th MNIST permutation, for groups of synapses binned by importance. (D) Same as C, except for the synapses connecting the first and second hidden layers (layer 2). (E) Same as C, except for the synapses connecting the second hidden layer and the output layer (layer 3).
Fig. 4.
Fig. 4.
Similar to Fig. 2, except showing the mean image classification accuracy for the ImageNet dataset split into 100 sequentially trained sections. (A) The dashed black, green, and magenta curves represent the mean accuracies for multihead networks without synaptic stabilization, with EWC or synaptic intelligence, respectively. The solid black, green, and magenta curves represent the mean accuracies for nonmultihead networks without synaptic stabilization, with EWC or synaptic intelligence, respectively. All further results involve nonmultihead networks. (B) The solid green and magenta curves represent the mean accuracies for networks with EWC or synaptic intelligence, respectively (same as in A). The dashed green and magenta curves represent the mean accuracies for networks with a context signal combined with EWC or synaptic intelligence, respectively (C) The dashed green and magenta curves represent the mean accuracies for networks with a context signal combined with EWC or synaptic intelligence, respectively (same as in B). The solid green and magenta curves represent the mean accuracies for split networks, with a context signal combined with EWC or synaptic intelligence, respectively. (D) The black curve represents the mean accuracy for networks with XdG used alone. The solid green and magenta curves represent the mean accuracies for networks with EWC or synaptic intelligence, respectively (same as in A). The dashed green and magenta curves represent the mean accuracies for networks with XdG combined with EWC or synaptic intelligence, respectively. SI, synaptic intelligence.
Fig. 5.
Fig. 5.
Task accuracy of recurrent networks sequentially trained on 20 cognitive-based tasks. (A) Schematics of the first four tasks. All trials involve a motion direction stimulus (represented by the white dot pattern and green motion direction arrow), a fixation cue (represented by the white centrally located dot), and an action response using an eye saccade (represented by a magenta arrow). (B) Green represents mean accuracy for networks with stabilization (synaptic intelligence) combined with a rule cue, trained by using supervised learning. Magenta dots represent the mean accuracy for networks with stabilization (synaptic intelligence) combined with XdG, trained by using supervised learning. Black dots represent the mean accuracy for networks with stabilization (synaptic intelligence) combined with XdG, trained by using reinforcement learning.

Similar articles

Cited by

References

    1. Peters A. The Fine Structure of the Nervous System: Neurons and Their Supporting Cells. Oxford Univ Press; Oxford: 1991.
    1. Kasai H, Matsuzaki M, Noguchi J, Yasumatsu N, Nakahara H. Structure–stability–function relationships of dendritic spines. Trends Neurosci. 2003;26:360–368. - PubMed
    1. Yuste R, Bonhoeffer T. Morphological changes in dendritic spines associated with long-term synaptic plasticity. Annu Rev Neurosci. 2001;24:1071–1089. - PubMed
    1. Yoshihara Y, De Roo M, Muller D. Dendritic spine formation and stabilization. Curr Opin Neurobiol. 2009;19:146–153. - PubMed
    1. Fischer M, Kaech S, Knutti D, Matus A. Rapid actin-based plasticity in dendritic spines. Neuron. 1998;20:847–854. - PubMed

Publication types

LinkOut - more resources