Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan 23;33(4):1521-34.
doi: 10.1523/JNEUROSCI.2068-12.2013.

Covariance-based synaptic plasticity in an attractor network model accounts for fast adaptation in free operant learning

Affiliations

Covariance-based synaptic plasticity in an attractor network model accounts for fast adaptation in free operant learning

Tal Neiman et al. J Neurosci. .

Abstract

In free operant experiments, subjects alternate at will between targets that yield rewards stochastically. Behavior in these experiments is typically characterized by (1) an exponential distribution of stay durations, (2) matching of the relative time spent at a target to its relative share of the total number of rewards, and (3) adaptation after a change in the reward rates that can be very fast. The neural mechanism underlying these regularities is largely unknown. Moreover, current decision-making neural network models typically aim at explaining behavior in discrete-time experiments in which a single decision is made once in every trial, making these models hard to extend to the more natural case of free operant decisions. Here we show that a model based on attractor dynamics, in which transitions are induced by noise and preference is formed via covariance-based synaptic plasticity, can account for the characteristics of behavior in free operant experiments. We compare a specific instance of such a model, in which two recurrently excited populations of neurons compete for higher activity, to the behavior of rats responding on two levers for rewarding brain stimulation on a concurrent variable interval reward schedule (Gallistel et al., 2001). We show that the model is consistent with the rats' behavior, and in particular, with the observed fast adaptation to matching behavior. Further, we show that the neural model can be reduced to a behavioral model, and we use this model to deduce a novel "conservation law," which is consistent with the behavior of the rats.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The two-population network model. A, A schematic description of the network architecture. Curved arrows indicate excitatory connections within the populations; circled-headed lines, inhibitory connections between the populations; and vertical arrows, external inputs. B, The activity of the two neuronal populations during a 10 s simulation of the model, using g1 = g2 = 0 and σ = 3.25. C, The dynamics of the difference in the activity of the two populations follows the dynamics of a particle in a double-well potential, subject to white noise. D, The shape of the double-well potential depends on the value of δg, depicted here for three different values of the external input.
Figure 2.
Figure 2.
The analytical approximations. A, Escape time as a function of δg. Blue and red dots represent the mean ± SEM escape time from targets 1 and 2, respectively, generated by simulations of Equations 1 and 2. The solid lines indicate the predicted mean escape time based on the double-well potential approximation (Eq. 16); and the hyphenated lines, the predicted mean escape times based on the parabolic approximation (Eq. 20). B, Escape time as a function of the magnitude of the noise for g1 = g2 = 0. Blue dots are mean ± SEM stay duration generated by simulations of Equations 1 and 2. The black line indicates the predicted mean escape time based on the double-well potential approximation (Eq. 16). A, B, Each dot is based on 104 stays.
Figure 3.
Figure 3.
The behavior of the rats. A, B, Histogram of the stay durations of a single subject from all the stationary sections, in which the baiting rate ratio was 9:1. A, Distribution of stay durations in the rich target. B, Distribution of stay durations in the lean target. C, Fractional investment as a function of fractional income. Each dot corresponds to one stationary section in which the baiting rates were kept constant. Colors represent the different subjects, and the different markers indicate the different baiting rate ratios (triangles, 1:9; circles, 1:3; squares, 1:1; diamonds, 3:1; inverted triangles, 9:1). The diagonal solid line indicates the behavior predicted from the matching law. D, Example of instantaneous estimates of fractional income (red) and fractional investment (blue) in a single experimental session of the subject depicted with cyan. At time t = 99.52 min, the baiting rate ratio was changed from 9:1 to 1:9 (vertical hyphenated line). The adaptation time is defined as the time interval between the change in the baiting rates and the time at which the instantaneous fractional investment reached halfway between the fractional investments in the stationary sections before and after the change in the baiting rates (dotted vertical line, see Materials and Methods). This experimental session is the same as in Gallistel et al. (2001, their Fig. 6, top right).
Figure 4.
Figure 4.
Simulations of the model (Eqs. 1–4). A, B, Histogram of the stay durations of simulations of the same sessions as in Figure 3A,B. C, Fractional investment as a function of fractional income. Same as in Figure 3C. D, Example of instantaneous estimates of fractional income (red) and fractional investment (blue) in a simulation of the experimental session depicted in Figure 3D.
Figure 5.
Figure 5.
Predictions of the behavioral model. A, The sum of the logarithm of the transition rates after the change in the baiting rates as a function of that sum before the change. Each dot corresponds to a single session. The different color codes for the different subjects. Sessions in which the baiting rates in the second section were a mirror image of those in the first section (ratio x:y in the first section changed to ratio y:x in the second section) are marked with circles. The diagonal black line indicates the prediction of the behavioral model. B, Mean VC as a function of the fractional investment. Each dot indicates the VC in one stationary section as a function of the fractional investment at target 1 in that section. The 3 different colors correspond to 3 different subjects, where the subject denoted in red was the one with the shortest mean VC, the subject denoted in green had the longest mean VC, and the subject denoted in blue had an intermediate mean VC. The solid lines are the predictions of the behavioral model (Eq. 31). The values of λ̃ used in the prediction were the geometric means of transition rates, averaged over all sessions, λ̃ = 0.64 s−1, λ̃ = 0.26 s−1, and λ̃ = 0.37 s−1, for the red, green, and blue subjects, respectively.
Figure 6.
Figure 6.
The effect of single rewards on stay duration. A, The effect of rewards on the duration of the rewarded stay. Green represents survival plot of the distribution of rewarded stays; and black, control survival plot. The analysis was repeated for surrogate data in which the rewards were redistributed in the stays according to a concurrent VI schedule, using the same parameters of the schedule as in the experiment. The hyphenated vertical line indicates time t = 2.5 s; 64% of the rewarded stays of the subjects were longer than 2.5 s (upper hyphenated line), compared with only 38% of the rewarded stays in the surrogate data (lower hyphenated line). This sets a lower limit on the fraction of rewarded stays that were prolonged as a result of the reward. B, The effect of rewards across stays. The mean ± SEM stay duration as a function of the number of stays elapsed from a rewarded stay in that target. Black represents the same analysis for the surrogate data. A, B, Stays were pooled across all subjects and taken only from the stationary sections in which the baiting rate ratio was 1:1.
Figure 7.
Figure 7.
Generalization of the model to a network with 3 populations. A, A schematic description of network architecture, same as in Figure 1A. B, The activity of the three neuronal populations during a 10 s simulation of the model, using g1 = g2 = g3 = −0.65 and σ = 0.26. C, Fractional investment as a function of fractional income in 116 simulations. The duration of each simulation, the overall reward rate, and the time of change in the baiting rates were as in the 116 experimental sessions used in Gallistel et al. (2001). The baiting rate ratios used in the simulations were 1:3:9, 1:9:9, 1:1:9, 1:3:3, 1:1:3, and 1:1:1. Because each target could be assigned with any of the baiting rates dictated by these ratios, there were a total of 19 possible schedules. The baiting rate ratios for the first section were chosen uniformly at random from the possible 19 schedules. The baiting rate ratios for the second section were also chosen uniformly at random, with the constraint that the second schedule was not identical to, or a permutation of, the ratios in the first schedule. Each dot corresponds to one stationary section in which the baiting rates were kept constant. Colors represent the different “subjects.” The diagonal solid line indicates the behavior predicted from the matching law. D, The sum of the logarithm of the transition rates after the change in the baiting rates as a function of that sum before the change, for the simulations of the three populations' network. Each dot corresponds to a single session. The different color codes for the different ”subjects.” The diagonal black line indicates the conservation of the product of transition rates.

Similar articles

Cited by

References

    1. Amari S, Arbib MA. Competition and cooperation in neural nets. Systems Neurosci. 1977;2:72–120.
    1. Baron J. Thinking and deciding. Cambridge, United Kingdom: Cambridge University; 2000.
    1. Corrado GS, Sugrue LP, Seung HS, Newsome WT. Linear-nonlinear-Poisson models of primate choice dynamics. J Exp Anal Behav. 2005;84:581–617. - PMC - PubMed
    1. Davison M, McCarthy D. The matching law: a research review. Hillsdale, NJ: Lawrence Erlbaum; 1988.
    1. Daw ND, Doya K. The computational neurobiology of learning and reward. Curr Opin Neurobiol. 2006;16:199–204. - PubMed

Publication types

LinkOut - more resources