Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun 23;30(25):8400-10.
doi: 10.1523/JNEUROSCI.4284-09.2010.

A reward-modulated hebbian learning rule can explain experimentally observed network reorganization in a brain control task

Affiliations

A reward-modulated hebbian learning rule can explain experimentally observed network reorganization in a brain control task

Robert Legenstein et al. J Neurosci. .

Abstract

It has recently been shown in a brain-computer interface experiment that motor cortical neurons change their tuning properties selectively to compensate for errors induced by displaced decoding parameters. In particular, it was shown that the three-dimensional tuning curves of neurons whose decoding parameters were reassigned changed more than those of neurons whose decoding parameters had not been reassigned. In this article, we propose a simple learning rule that can reproduce this effect. Our learning rule uses Hebbian weight updates driven by a global reward signal and neuronal noise. In contrast to most previously proposed learning rules, this approach does not require extrinsic information to separate noise from signal. The learning rule is able to optimize the performance of a model system within biologically realistic periods of time under high noise levels. Furthermore, when the model parameters are matched to data recorded during the brain-computer interface learning experiments described above, the model produces learning effects strikingly similar to those found in the experiments.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Description of the 3D cursor control task and network model for cursor control. A, The task was to move the cursor from the center of an imaginary cube to one of its eight corners. The target direction y*(t) was given by the direction of the straight line from the current cursor position to the target position. B, Schematic of the network model used for the cursor control task. A set of m neurons project to ntotal noisy neurons in motor cortex. The monkey arm movement was modeled by a fixed linear mapping from the activities of the modeled motor cortex neurons to the 3D velocity vector of the monkey arm. A subset of n neurons in the simulated motor cortex was recorded for cursor control. The velocity of the cursor movement at time t was given by the population vector, which is the vector sum of decoding PDs of recorded neurons weighted by their normalized activities.
Figure 2.
Figure 2.
Experimentally observed variances of the spike count in a 200 ms window for motor cortex neurons and noise in the neuron model. For a given target direction, the variance of the spike count for a unit scales approximately linearly with the mean activity of the unit in that direction with a bend at ∼5 spikes (20 Hz) in a monkey experiment (gray dashed line; data smoothed, see Materials and Methods). Similar behavior can be obtained by an appropriate noise distribution in our neuron model with non-negative linear activation function (black line).
Figure 3.
Figure 3.
One example simulation of the 50% perturbation experiment with the EH rule and data-derived network parameters. A, Angular match Rang as a function of learning time. Every 100th time point is plotted. B, PD shifts projected onto the rotation plane (the rotation axis points toward the reader) for rotated (red) and nonrotated (black) neurons from their initial values (light color) to their values after training (intense color, these PDs are connected by the shortest path on the unit sphere; axes in arbitrary units). The PDs of rotated neurons are consistently rotated counter-clockwise to compensate for the perturbation. C, Tuning of an example rotated neuron to target directions of angle Φ in the rotation plane (y–z-plane) before (gray) and after (black) training. The target direction for a given Φ was defined as y*(Φ) = (1/2)(1, cos(Φ), sin(Φ))T. Circles on the x-axis indicate projected preferred directions of the neuron before (gray) and after (black) training.
Figure 4.
Figure 4.
PD shifts in simulated perturbation sessions are in good agreement with experimental data [compare to Jarosiewicz et al. (2008), their Fig. 3A,B]. Shift in the PDs measured after simulated perturbation sessions relative to initial PDs for all units in 20 simulated experiments where 25% (A) or 50% (B) of the units were rotated. Dots represent individual data points and black circled dots represent the means of the rotated (red) and nonrotated (blue) units.
Figure 5.
Figure 5.
PD shifts in simulated 50% perturbation sessions with the learning rules in Equations 18 (A) and 19 (B). Dots represent individual data points and black circled dots represent the means of the rotated (red) and nonrotated (blue) units. No credit assignment effect can be observed for these rules.
Figure 6.
Figure 6.
PDs shifts in simulated washout sessions. A–D, Shift in the PDs (mean over 20 trials) for rotated neurons (gray) and nonrotated neurons (black) relative to PDs of the control session as a function of the number of targets presented for 25% perturbation (A, C) and 50% perturbation (B, D). A, B, Simulations with the same learning rate as in the simulated perturbation session. C, D, Simulations with a five times larger learning rate.
Figure 7.
Figure 7.
Comparison of network performance before and after learning for 50% perturbation. Angular match Rang(t) of the cursor movements in one reaching trial before (gray) and after (black) learning as a function of the time since the target was first made visible. The black curve ends prematurely because the target is reached faster. Note the reduced temporal jitter of the performance after learning, indicating reduced sensitivity to the noise signal.
Figure 8.
Figure 8.
Generalization of network performance for a 50% perturbation experiment with cursor movements to a single target location during training. Twenty independent simulations with randomly drawn target positions (from the corners of the unit cube) and rotation axes (either the x-, y-, or z-axis) were performed. In each simulation the network model was first tested on 100 random target directions, then trained for 320 trials, and then tested again on 100 random target directions. Angular matches of the test trials before (gray) and after (black) training are plotted against the angle between the target direction vector in the test and the vector from the origin to the training target location. Shown is the mean and SD of the angular match Rang over movements with an angle to the training direction in [0, 20]°, (20, 40]°, (40, 60]°, etc. For clarity, the SD for movements before learning is not shown. It was quite constant over all angles being 0.15 in the mean.
Figure 9.
Figure 9.
Behavior of the EH rule in simulated perturbation sessions (50% perturbed neurons) for different parameter settings. All plotted values are means over 10 independent simulations. Logarithms are to the basis of 2. The black circle indicates the parameter setting used in Results. A, Dependence of network performance measured as the mean number of targets reached per time step on learning rate η and exploration level υ. Performance deteriorates for high learning rate and exploration levels. B, Mean PD shifts in rotation direction for rotated neurons. C, Mean PD shifts in rotation direction for nonrotated neurons. In comparison to rotated neurons, PD shifts of nonrotated neurons are small, especially for larger exploration levels.

Similar articles

Cited by

References

    1. Baras D, Meir R. Reinforcement learning, spike-time-dependent plasticity, and the BCM rule. Neural Comput. 2007;19:2245–2279. - PubMed
    1. Barto AG, Sutton RS, Anderson CW. Neuronlike adaptive elements that can solve difficult learning and control problems. IEEE Trans Syst Man Cybern. 1983;13:835–846.
    1. Baxter J, Bartlett PL. Canberra, Australia: Research School of Information Sciences and Engineering, Australian National University; 1999. Direct gradient-based reinforcement learning: I. Gradient estimation algorithms.
    1. Baxter J, Bartlett PL. Infinite-horizon policy-gradient estimation. J Artif Intell Res. 2001;15:319–350.
    1. Bienenstock EL, Cooper LN, Munro PW. Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J Neurosci. 1982;2:32–48. - PMC - PubMed

Publication types

LinkOut - more resources