A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice
- PMID: 24255115
- PMCID: PMC3856837
- DOI: 10.1073/pnas.1312125110
A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice
Abstract
Both in humans and in animals, different individuals may learn the same task with strikingly different speeds; however, the sources of this variability remain elusive. In standard learning models, interindividual variability is often explained by variations of the learning rate, a parameter indicating how much synapses are updated on each learning event. Here, we theoretically show that the initial connectivity between the neurons involved in learning a task is also a strong determinant of how quickly the task is learned, provided that connections are updated in a multiplicative manner. To experimentally test this idea, we trained mice to perform an auditory Go/NoGo discrimination task followed by a reversal to compare learning speed when starting from naive or already trained synaptic connections. All mice learned the initial task, but often displayed sigmoid-like learning curves, with a variable delay period followed by a steep increase in performance, as often observed in operant conditioning. For all mice, learning was much faster in the subsequent reversal training. An accurate fit of all learning curves could be obtained with a reinforcement learning model endowed with a multiplicative learning rule, but not with an additive rule. Surprisingly, the multiplicative model could explain a large fraction of the interindividual variability by variations in the initial synaptic weights. Altogether, these results demonstrate the power of multiplicative learning rules to account for the full dynamics of biological learning and suggest an important role of initial wiring in the brain for predispositions to different tasks.
Keywords: behavior; cue competition; memory; savings.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
) is initially equal to 1; and (Bottom) strongly unbalanced start,
. In the latter situation, the model initially responds only with lick decisions (arrow) until
decreases. (C) Learning curves for a model based on a multiplicative learning curve for three different initial conditions sketched in Insets: (Top) all initial weights initially large; (Middle) the synaptic weights between sound units and the decision circuit (
) are 10-fold smaller than those in the Top; the low synaptic weights initially slow down discrimination learning; and (Bottom)
is 100-fold smaller. Red and blue lines: probability of correct performance for the rewarded and the nonrewarded sound, respectively. Black line: overall performance.
), and when the expectation error function in the multiplicative model is symmetrical (
).
Similar articles
-
Cortical recruitment determines learning dynamics and strategy.Nat Commun. 2019 Apr 1;10(1):1479. doi: 10.1038/s41467-019-09450-0. Nat Commun. 2019. PMID: 30931939 Free PMC article.
-
A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6. Neuroscience. 1999. PMID: 10391468
-
High-order feature-based mixture models of classification learning predict individual learning curves and enable personalized teaching.Proc Natl Acad Sci U S A. 2013 Jan 8;110(2):684-9. doi: 10.1073/pnas.1211606110. Epub 2012 Dec 26. Proc Natl Acad Sci U S A. 2013. PMID: 23269833 Free PMC article.
-
Reward-dependent learning in neuronal networks for planning and decision making.Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0. Prog Brain Res. 2000. PMID: 11105649 Review.
-
Neurocomputational mechanisms of reinforcement-guided learning in humans: a review.Cogn Affect Behav Neurosci. 2008 Jun;8(2):113-25. doi: 10.3758/cabn.8.2.113. Cogn Affect Behav Neurosci. 2008. PMID: 18589502 Review.
Cited by
-
CoBeL-RL: A neuroscience-oriented simulation framework for complex behavior and learning.Front Neuroinform. 2023 Mar 9;17:1134405. doi: 10.3389/fninf.2023.1134405. eCollection 2023. Front Neuroinform. 2023. PMID: 36970657 Free PMC article.
-
Cortical recruitment determines learning dynamics and strategy.Nat Commun. 2019 Apr 1;10(1):1479. doi: 10.1038/s41467-019-09450-0. Nat Commun. 2019. PMID: 30931939 Free PMC article.
-
Structural and Functional Brain Remodeling during Pregnancy with Diffusion Tensor MRI and Resting-State Functional MRI.PLoS One. 2015 Dec 10;10(12):e0144328. doi: 10.1371/journal.pone.0144328. eCollection 2015. PLoS One. 2015. PMID: 26658306 Free PMC article.
-
Dissociating task acquisition from expression during learning reveals latent knowledge.Nat Commun. 2019 May 14;10(1):2151. doi: 10.1038/s41467-019-10089-0. Nat Commun. 2019. PMID: 31089133 Free PMC article.
-
Temporal chunking as a mechanism for unsupervised learning of task-sets.Elife. 2020 Mar 9;9:e50469. doi: 10.7554/eLife.50469. Elife. 2020. PMID: 32149602 Free PMC article.
References
-
- Holmes A, Wrenn CC, Harris AP, Thayer KE, Crawley JN. Behavioral profiles of inbred strains on novel olfactory, spatial and emotional tests for reference memory in mice. Genes Brain Behav. 2002;1(1):55–69. - PubMed
-
- Luksys G, Gerstner W, Sandi C. Stress, genotype and norepinephrine in the prediction of mouse behavior using reinforcement learning. Nat Neurosci. 2009;12(9):1180–1186. - PubMed
-
- Dayan P, Abbott LF. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. Cambridge, MA: MIT Press; 2001.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
