Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun 17:4:17.
doi: 10.3389/fncom.2010.00017. eCollection 2010.

Synaptic theory of replicator-like melioration

Affiliations

Synaptic theory of replicator-like melioration

Yonatan Loewenstein. Front Comput Neurosci. .

Abstract

According to the theory of Melioration, organisms in repeated choice settings shift their choice preference in favor of the alternative that provides the highest return. The goal of this paper is to explain how this learning behavior can emerge from microscopic changes in the efficacies of synapses, in the context of a two-alternative repeated-choice experiment. I consider a large family of synaptic plasticity rules in which changes in synaptic efficacies are driven by the covariance between reward and neural activity. I construct a general framework that predicts the learning dynamics of any decision-making neural network that implements this synaptic plasticity rule and show that melioration naturally emerges in such networks. Moreover, the resultant learning dynamics follows the Replicator equation which is commonly used to phenomenologically describe changes in behavior in operant conditioning experiments. Several examples demonstrate how the learning rate of the network is affected by its properties and by the specifics of the plasticity rule. These results help bridge the gap between cellular physiology and learning behavior.

Keywords: operant conditioning; reinforcement learning; synaptic plasticity.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Relating synaptic efficacies to choice behavior. (A) A schematic description of a decision making network composed of six neurons which are connected by eight synapses. The probabilities that the network will “choose” alternative 1 and 2 (p1 and p2, respectively) depend on the efficacies of the eight synapses (denoted by Wi). (B) Changes in the efficacies of the synapses (left) result in a change of the probabilities of choice (right).
Figure 2
Figure 2
The decision-making network model. The network consists of two populations of sensory neurons, each denoted by Sa,i, and two populations of premotor neurons, Ma. Strength of synaptic connection between sensory neuron Sa,i and the corresponding premotor population Ma is denoted by Wa,i. Decision is mediated via competition between the premotor populations (see text).
Figure 3
Figure 3
Covariance-based synaptic plasticity and learning. Learning behavior in a two-armed bandit reward schedule in which alternatives 1 and 2 provided a binary reward with a probability of 0.75 and 0.25, respectively.(A–D) Circles, fraction of choosing alternative 1 in 1,000 simulation of the stochastic dynamics; red line, the average velocity approximation. (A) tWTA model; (B,C) population coding model. (B) Black circles, postsynaptic activity dependent plasticity; blue circles, Hebbian plasticity. (C) Presynaptic activity-dependent plasticity. (D) Dynamic competition model. Plasticity rate in all examples was chosen such that the probability of choosing alternative 1 after 200 trials, as estimated by the average velocity approximation, is 0.75. See Section “Materials and Methods” for parameters of the simulations.

Similar articles

Cited by

References

    1. Baras D., Meir R. (2007). Reinforcement learning, spike-time-dependent plasticity, and the BCM rule. Neural. Comput. 19, 2245–227910.1162/neco.2007.19.8.2245 - DOI - PubMed
    1. Barraclough D. J., Conroy M. L., Lee D. (2004). Prefrontal cortex and decision making in a mixed-strategy game. Nat. Neurosci. 7, 404–41010.1038/nn1209 - DOI - PubMed
    1. Bogacz R., Brown E., Moehlis J., Holmes P., Cohen J. D. (2006). The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol. Rev. 113, 700–76510.1037/0033-295X.113.4.700 - DOI - PubMed
    1. Borgers T., Sarin R. (1997). Learning through reinforcement and replicator dynamics. J. Econ. Theory 77, 1–1410.1006/jeth.1997.2319 - DOI
    1. Cross J. G. (1973). A stochastic learning model of economic behavior. Q. J Econ. 87, 239–26610.2307/1882186 - DOI