Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug:213:104621.
doi: 10.1016/j.cognition.2021.104621. Epub 2021 Feb 17.

When forgetting fosters learning: A neural network model for statistical learning

Affiliations

When forgetting fosters learning: A neural network model for statistical learning

Ansgar D Endress et al. Cognition. 2021 Aug.

Erratum in

Abstract

Learning often requires splitting continuous signals into recurring units, such as the discrete words constituting fluent speech; these units then need to be encoded in memory. A prominent candidate mechanism involves statistical learning of co-occurrence statistics like transitional probabilities (TPs), reflecting the idea that items from the same unit (e.g., syllables within a word) predict each other better than items from different units. TP computations are surprisingly flexible and sophisticated. Humans are sensitive to forward and backward TPs, compute TPs between adjacent items and longer-distance items, and even recognize TPs in novel units. We explain these hallmarks of statistical learning with a simple model with tunable, Hebbian excitatory connections and inhibitory interactions controlling the overall activation. With weak forgetting, activations are long-lasting, yielding associations among all items; with strong forgetting, no associations ensue as activations do not outlast stimuli; with intermediate forgetting, the network reproduces the hallmarks above. Forgetting thus is a key determinant of these sophisticated learning abilities. Further, in line with earlier dissociations between statistical learning and memory encoding, our model reproduces the hallmarks of statistical learning in the absence of a memory store in which items could be placed.

Keywords: Chunking; Implicit learning; Neural networks; Statistical learning; Transitional probabilities.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic representation of the network architecture with three units A, B and C (e.g., representing syllables). All units inhibit each other with a fixed weight of β. They also have tunable excitatory connections. For example, unit A sends excitatory input to unit B with a weight of wBA and sends excitatory input to unit C with a weight of w CA. In addition to excitation and inhibition, all units undergo forgetting.
Figure 2.
Figure 2.
Illustration of the computational principles of the simulations. We plot the network activation when stimulated by a recurring unit ABC. (a) On the first occurrence of the unit, no associations have been formed yet. Hence, when A is presented, A (but no other items) becomes active, and then decays, though some activation persists even while C is presented. Likewise, B and C become active upon presentation, and then decay. The initial activation is weaker for B and C than for A due to the presence of inhibitory interactions; this is because, for A, no other potentially inhibiting representations are active yet, while other activated items (e.g., A) have inhibitory input for B and C. (b) On the last occurrence of a unit, associations between the items have been formed. When the network is externally stimulated with a unit such as ABC, the activation of B and C is greater than that of A when the corresponding items are stimulated. This is because B and C (but not A) receive excitatory input from the strongly associated, preceding items. (c) Weights at the end of the familiarization phase. The connection weights between adjacent items are stronger than those between non-adjacent items (i.e., between A and C).
Figure 3.
Figure 3.
Results for items presented in backward order, different forgetting rates (0, 0.2, 0.4, 0.6, 0.8 and 1), and for the different comparisons (Unit vs. Part-Unit: ABC vs. BC:D and ABC vs. C:DE; Rule-Unit vs. Class-Unit: AGC vs. AGF and AXC vs. AXF). (a) Difference scores. The scores are calculated based the global activation as a measure of the network’s familiarity with the items. Significance is assessed based on Wilcoxon tests against the chance level of zero. (b) Percentage of simulations with a preference for the target items. The simulations are assessed based on the global activation in the network. The dashed line shows the minimum percentage of simulations that is significant based on a binomial test.
Figure 4.
Figure 4.
Results for items presented in backward order, different forgetting rates (0, 0.2, 0.4, 0.6, 0.8 and 1), and for the different comparisons (Unit vs. Part-Unit: ABC vs. BC:D and ABC vs. C:DE; Rule-Unit vs. Class-Unit: AGC vs. AGF and AXC vs. AXF). (a) Difference scores. The scores are calculated based the global activation as a measure of the network’s familiarity with the items. Significance is assessed based on Wilcoxon tests against the chance level of zero. (b) Percentage of simulations with a preference for the target items. The simulations are assessed based on the global activation in the network. The dashed line shows the minimum percentage of simulations that is significant based on a binomial test.
Figure 5.
Figure 5.
Results of the simulations comprising phantom-units, for items presented in forward order, different forgetting rates (0, 0.2, 0.4, 0.6, 0.8 and 1), and for the different comparisons (Unit vs. Part-Unit: ABC vs. BC:D and ABC vs. C:DE; Phantom-Unit vs. Part-Unit: Phantom-Unit vs. BC:D and Phantom-Unit vs. C:DE; Unit vs. Phantom-Unit). (a) Difference scores. The scores are calculated based the global activation as a measure of the network’s familiarity with the items. Significance is assessed based on Wilcoxon tests against the chance level of zero. (b) Percentage of simulations with a preference for the target items. The simulations are assessed based on the global activation. The dashed line shows the minimum percentage of simulations that is significant based on a binomial test.

Similar articles

Cited by

References

    1. Aslin RN, Saffran JR, & Newport EL (1998). Computation of conditional probability statistics by 8-month-old infants. Psychological Science, 9, 321–324.
    1. Baddeley AD, & Scott D (1971). Short term forgetting in absence of proactive interference. The Quarterly Journal of Experimental Psychology, 23, 275–283.
    1. Batchelder EO (2002). Bootstrapping the lexicon: A computational model of infant speech segmentation. Cognition, 83(2), 167–206. - PubMed
    1. Bays PM, Singh-Curry V, Gorgoraptis N, Driver J, & Husain M (2010). Integration of goal- and stimulus-related visual signals revealed by damage to human parietal cortex. Journal of Neuroscience, 30, 5968–5978. doi: 10.1523/JNEUROSCI.0997-10.2010 - DOI - PMC - PubMed
    1. Berman MG, Jonides J, & Lewis RL (2009). In search of decay in verbal short-term memory. Journal of Experimental Psychology. Learning, Memory, and Cognition, 35(2), 317–33. doi: 10.1037/a0014873 - DOI - PMC - PubMed

Publication types

LinkOut - more resources