Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Dec 3;110(49):19950-5.
doi: 10.1073/pnas.1312125110. Epub 2013 Nov 19.

A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice

Affiliations

A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice

Brice Bathellier et al. Proc Natl Acad Sci U S A. .

Abstract

Both in humans and in animals, different individuals may learn the same task with strikingly different speeds; however, the sources of this variability remain elusive. In standard learning models, interindividual variability is often explained by variations of the learning rate, a parameter indicating how much synapses are updated on each learning event. Here, we theoretically show that the initial connectivity between the neurons involved in learning a task is also a strong determinant of how quickly the task is learned, provided that connections are updated in a multiplicative manner. To experimentally test this idea, we trained mice to perform an auditory Go/NoGo discrimination task followed by a reversal to compare learning speed when starting from naive or already trained synaptic connections. All mice learned the initial task, but often displayed sigmoid-like learning curves, with a variable delay period followed by a steep increase in performance, as often observed in operant conditioning. For all mice, learning was much faster in the subsequent reversal training. An accurate fit of all learning curves could be obtained with a reinforcement learning model endowed with a multiplicative learning rule, but not with an additive rule. Surprisingly, the multiplicative model could explain a large fraction of the interindividual variability by variations in the initial synaptic weights. Altogether, these results demonstrate the power of multiplicative learning rules to account for the full dynamics of biological learning and suggest an important role of initial wiring in the brain for predispositions to different tasks.

Keywords: behavior; cue competition; memory; savings.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Interindividual variability in auditory discrimination learning. (A) Sketch of the Go/NoGo behavioral task. For the sound spectrograms, the time and frequency axes range from 0 ms to 70 ms and from 1 kHz to 100 kHz (logarithmic scale), respectively. (B) Population learning curve for the overall performance of 15 mice. Binning: 100 trials. (C) Cumulative distribution of the number of trials necessary for each mouse to reach 80% correct performance measured from sigmoid functions fitted to the individual learning curves.
Fig. 2.
Fig. 2.
Discrimination performance increases following variable delays. (A) Individual learning curves for four mice. The red and blue lines represent the probability of correct performance for the rewarded sounds (S+) and the nonrewarded sound (S−), respectively. The black line is the overall performance (average of the red and blue lines). Binning: 180 trials. (B) Overall performance for a fifth mouse (black line) fitted with a sigmoid function (red dashed line; the black dashed lines indicate the asymptotic values). The sigmoid function is used to evaluate the delay until behavioral performance reaches 20% of the asymptotic performance increase (delay = t20%) and the additional number of trials needed to reach 80% of the asymptotic performance increase (rise = t80%t20%). (C) Plot of t80%t20% against t20% for 15 mice with the best linear fit to the data (black line). The magenta circle stands for the measurements in B.
Fig. 3.
Fig. 3.
Additive vs. multiplicative rule in a reinforcement learning model. (A) Sketch of the reinforcement learning (RL) model. (B) Learning curves for a model based on an additive learning rule for three different initial conditions sketched in Insets: (Top) balanced start, all initial weights are equal; (Middle) slightly unbalanced start, for the “port” unit the difference between the synaptic weights to the decision and the inhibitory unit (formula image) is initially equal to 1; and (Bottom) strongly unbalanced start, formula image. In the latter situation, the model initially responds only with lick decisions (arrow) until formula image decreases. (C) Learning curves for a model based on a multiplicative learning curve for three different initial conditions sketched in Insets: (Top) all initial weights initially large; (Middle) the synaptic weights between sound units and the decision circuit (formula image) are 10-fold smaller than those in the Top; the low synaptic weights initially slow down discrimination learning; and (Bottom) formula image is 100-fold smaller. Red and blue lines: probability of correct performance for the rewarded and the nonrewarded sound, respectively. Black line: overall performance.
Fig. 4.
Fig. 4.
Faster reversal learning is explained by the multiplicative learning rule. (A) Illustration of reversal learning in which the sound spectrograms depict the rewarded and nonrewarded stimuli. (B) Population learning curves for the overall performance of 15 mice (dashed lines) during initial (black) and reversal trainings (purple). Binning: 180 trials. (C) Number of trials for individual mice to reach 80% behavioral performance in the initial training vs. the reversal training. (D) Population learning curves for the overall performance of 15 mice (dashed lines) and for the fitted reinforcement learning models endowed with a multiplicative learning rule (solid lines) during initial (black) and reversal trainings (purple). (E) Population learning curves for the overall performance of 15 mice (dashed lines) and for the fitted reinforcement models endowed with an additive learning rule (solid lines) during initial (black) and reversal trainings (purple). Unlike that observed in behaving mice, the additive model is learned more slowly during reversal training. (F) Number of trials to reach 80% behavioral performance in the initial training vs. the reversal training, for the additive (white circles) and the multiplicative (gray circles) models. (G) Population learning curves as in D but including the performance for rewarded and nonrewarded trials. (H) Percentage of variance unexplained by fitted models for both initial and reversal trainings when the additive learning rule is used (Additive), when the multiplicative learning rule is used (Multiplicative), when the synaptic diffusion term is omitted in the multiplicative model (formula image), and when the expectation error function in the multiplicative model is symmetrical (formula image).
Fig. 5.
Fig. 5.
Initial connectivity can explain a large fraction of the interindividual variability. (A) Behavioral learning curves (dashed lines) and the fit obtained with the multiplicative model (solid lines) when only the initial connectivity parameters are allowed to vary across the 15 animals. Binning: 180 trials. (B and C) Two examples of single-mouse learning curves and of the fit obtained when only the initial connectivity parameters are allowed to vary across the 15 animals. Strikingly, strong discrepancies in the reversal learning curves (arrow) can be accounted for by different initial conditions.

Similar articles

Cited by

References

    1. Holmes A, Wrenn CC, Harris AP, Thayer KE, Crawley JN. Behavioral profiles of inbred strains on novel olfactory, spatial and emotional tests for reference memory in mice. Genes Brain Behav. 2002;1(1):55–69. - PubMed
    1. Kosten TA, Kim JJ, Lee HJ. Early life manipulations alter learning and memory in rats. Neurosci Biobehav Rev. 2012;36(9):1985–2006. - PMC - PubMed
    1. Luksys G, Gerstner W, Sandi C. Stress, genotype and norepinephrine in the prediction of mouse behavior using reinforcement learning. Nat Neurosci. 2009;12(9):1180–1186. - PubMed
    1. Tsai KJ, Chen SK, Ma YL, Hsu WL, Lee EH. sgk, a primary glucocorticoid-induced gene, facilitates memory consolidation of spatial learning in rats. Proc Natl Acad Sci USA. 2002;99(6):3990–3995. - PMC - PubMed
    1. Dayan P, Abbott LF. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. Cambridge, MA: MIT Press; 2001.

Publication types

LinkOut - more resources