Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan;10(1):e1003377.
doi: 10.1371/journal.pcbi.1003377. Epub 2014 Jan 9.

Interference and shaping in sensorimotor adaptations with rewards

Affiliations

Interference and shaping in sensorimotor adaptations with rewards

Ran Darshan et al. PLoS Comput Biol. 2014 Jan.

Abstract

When a perturbation is applied in a sensorimotor transformation task, subjects can adapt and maintain performance by either relying on sensory feedback, or, in the absence of such feedback, on information provided by rewards. For example, in a classical rotation task where movement endpoints must be rotated to reach a fixed target, human subjects can successfully adapt their reaching movements solely on the basis of binary rewards, although this proves much more difficult than with visual feedback. Here, we investigate such a reward-driven sensorimotor adaptation process in a minimal computational model of the task. The key assumption of the model is that synaptic plasticity is gated by the reward. We study how the learning dynamics depend on the target size, the movement variability, the rotation angle and the number of targets. We show that when the movement is perturbed for multiple targets, the adaptation process for the different targets can interfere destructively or constructively depending on the similarities between the sensory stimuli (the targets) and the overlap in their neuronal representations. Destructive interferences can result in a drastic slowdown of the adaptation. As a result of interference, the time to adapt varies non-linearly with the number of targets. Our analysis shows that these interferences are weaker if the reward varies smoothly with the subject's performance instead of being binary. We demonstrate how shaping the reward or shaping the task can accelerate the adaptation dramatically by reducing the destructive interferences. We argue that experimentally investigating the dynamics of reward-driven sensorimotor adaptation for more than one sensory stimulus can shed light on the underlying learning rules.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Schematic description of the sensorimotor adaptation task and the model.
A. The rotation task. From left to right: 1) A circular target (red circle) of radius formula image appears on the screen at direction formula image (here formula image) to instruct the subject where to move the cursor. 2) The subject moves the cursor, which is invisible to him, toward the target (blue arrow). The only information available to the subject on his performance is the reward, delivered only if the cursor falls within the target. 3) A perturbation is introduced: the cursor is rotated by an angle γ with respect to the direction of the subject's hand movement (black arrow). 4) A learning phase follows where the subject progressively adapts to the perturbation, reducing the distance between the cursor endpoint and the target. B. Schematic description of the model. When the target appears, the activity profile of the input layer (red neurons) peaks around the target direction. The parameter ρ controls the width of the activity profile. The connectivity matrix between the input and the output (blue neurons) layers is denoted by formula image. A Gaussian noise with zero mean and a standard deviation of formula image is added to the output layer of the network. The two-dimensional output vector rotated by the matrix formula image represents the cursor endpoint. A reward is delivered if the distance between the cursor endpoint and the center of the target is smaller than formula image. The connectivity matrix formula image is then changed according to a reward-modulated plasticity rule (see Eq(8)).
Figure 2
Figure 2. Learning dynamics when the network adapts to the rotation for one target.
A. An examples of a learning curve for formula image. Left: the error is calculated as the squared distance between the cursor endpoint and the target (see Eq. (5)) and plotted as a function of the trial number. The rotation perturbation is applied on trials following t = 0. For display purposes, only one in four trials is displayed. The solid line represents the error, smoothed with a 100 trials sliding median window. Final error of formula image (mean± SE, computed as explained in Materials and Methods). Dashed purple line: Target size. Right: as in left, but only the directional part of the error is plotted against the trial number. The shaded area corresponds to the target size. B. Same as in the left panel of A. but with formula image and corresponding final error of formula image. C. Same as in the left panel of A. but with formula image and a corresponding final error of formula image. D. Probability density function (p.d.f.) of the logarithm of the learning duration. The learning duration (formula image) is defined as the number of trials it takes to learn the task (see: Materials and Methods). Target size is formula image. E. Trade-off between learning duration and final error. Average of formula image distribution (green) and the final error (blue) are plotted against the target size. The shaded area around the averages corresponds to half SD of the distributions. Solid lines: formula image. Dashed lines: formula image. F. The probability of getting the first reward, formula image (see Eq. (10)), vs. the noise level, formula image for two values of the target size. In all the panels: formula image.
Figure 3
Figure 3. Performance and noiseless performance after learning depends on the learning rate.
A. An example of the variations of the error (blue) and the noiseless error (red) with the number of trials for formula image (purple dashed line), formula image and a normalized learning rate (formula image, see Eq. (12)) of 0.3. For display purposes, only one in four trials is displayed. B. The performance (blue), i.e., the probability that formula image and the noiseless performance (red), i.e., the probability that formula image are plotted against the normalized learning rate. These quantities were estimated from simulations of formula image trials, while excluding the transient learning phase. Note that for formula image the noiseless performance is perfect. The standard error of the mean is too small to notice. C. Distribution of the noiseless error, formula image, at the end of the learning phase. For formula image, the support of the distribution is bounded by formula image. For formula image, the distribution is uniform for formula image and zero otherwise. For formula image the support of the distribution is bounded but extends beyond formula image. In B and C: formula image; formula image.
Figure 4
Figure 4. Shaping the task allows the network to adapt to a large rotation angle (here ) even if the target size and the noise level are extremely small.
A. Shaping by decreasing the target size, as explained in the text. Parameters: formula image; formula image; formula image; formula image. Blue: The error is sampled every 3 trials (dots) and smoothed with a 50 trials median sliding window (line) vs. the number of trials. Purple: The size of the target. B. Reach angle (in degrees) as a function of the trial number when the rotation angle is progressively increased (see Results). The target size is fixed: formula image. At formula image, formula image. The rotation angle is increased by formula image every 25 trials up to formula image. The shaded area corresponds to the target size (formula image around the target center). Inset: the network is unable to follow the gradual rotation for a different realization of the noise with the same parameters. In both panels: formula image.
Figure 5
Figure 5. Shaping the reward function accelerates adaptation without impairing performance.
A. The reward is given by formula image. Top: learning curve for a reward function that changes abruptly around target size (formula image). Bottom, main panel: learning curve for a gradual reward function (formula image). Note the change in the abscissa scale. Inset: The reward function vs. the error. The target size is dashed purple line. B. The learning duration and the performance vs. the smoothing parameter, T. Solid lines: Deterministic smooth reward function as in A. Dashed lines: Stochastic binary reward delivered with a probability that depends on formula image (see Results). In A and B: formula image; formula image.
Figure 6
Figure 6. The generalization error () for a new target (defined as the test target), presented after the network has adapted to one target.
formula image is plotted as a function of the angle of the test target after adaptation to a target in direction formula image. Perfect generalization is when formula image. Lines: Analytical result for formula image (see Eq.(19)). Circles: Simulation results for formula image. For clarity, the results are displayed for test targets sampled every 15 degrees. The generalization error was averaged over 200 realizations of the noise. Shaded area represents one formula image around the averages. Gray line: zero formula image. The mapping between formula image and the half-bandwidth, formula image, is given in Eq. (3). For instance, formula image corresponds to formula image and formula image to formula image. Parameters: formula image; formula image.
Figure 7
Figure 7. Delayed learning for two targets in opposite directions.
A. Learning curves plotted against the number of trials for each of the targets, sampled every 10 trials. For the target that is learned first (resp. second) the curve is plotted in blue (resp. green). Top: formula image. Middle: formula image. Bottom panel: formula image. B. Distribution of learning duration for two opposite targets for different noise levels. Solid lines: The probability density functions of formula image (blue) and formula image (green) for the two targets (solid lines) where formula image (resp. formula image) is the learning duration for the target that is learned first (resp. second). Dashed lines: Distributions of formula image and formula image assuming that formula image and formula image are independent random variables. The distributions were estimated over formula image realizations of the noise. Simulations were long enough for the network to eventually adapt to both targets. Top: formula image. Bottom: formula image. C. The average and the SD of the distributions of formula image(blue) and formula image (green) vs. the noise level. D. The distribution of the ratio formula image for the two noise level values in B.
Figure 8
Figure 8. Geometric intuition for the destructive and constructive interferences.
Following the perturbation, the cursor is rotated with respect to the output of the network, hence inducing a large noiseless error (black vector in panel 1.). The noise in the output layer (green vector in panel 2) helps the network to explore the 2D environment, until the cursor falls inside one of the targets (panel 2). This trial is rewarded and therefore the connectivity matrix is updated, affecting the output of the network for the next trials. This decreases the noiseless error, for the target for which the trial has been rewarded, as the rotated output of the network is now closer to it (by adding the vector formula image, panel 3). This update moves the rotated output away from the target in the opposite direction since the vector formula image is away from it. This results in an increase in the error, referred to as destructive interference. The probability of a rewarded trial for this target is now substantially reduced, delaying learning for that target. A similar effect occurs when the two targets are sufficiently far apart. However, when they are close (panel 5) the interference becomes constructive, since after the update of the matrix, the rotated output gets closer to both targets. Note that the overlap, formula image, depends on the width of the tuning curves (see Materials and Methods).
Figure 9
Figure 9. Destructive and constructive interferences are a function of the model parameters.
The correlation coefficient, formula image, characterizes the strength and the nature of the interference during learning of the rotation task for two targets. A. formula image for different values of the angular distance between the targets. The interference becomes constructive when formula image decreases. B. The extremum of formula image over t, formula image, plotted against formula image for different values of formula image. Purple: formula image. Blue: formula image. Green: formula image. The width of the curve was chosen to correspond to the formula image of formula image, estimated by bootstrap. Note the slight non-monotonicity for formula image. Inset: formula image for formula image, formula image, formula image for formula image (same color code as for the dots on the main figure in this panel). Parameters: formula image. C–F formula image is plotted for different values of formula image (C), formula image (D), formula image (E) and formula image (F). In all these figures, formula image was calculated over formula image repetitions. The result was low-pass filtered to suppress fast trial-to-trial fluctuations for the sake of clarity. Consequently, there is a causality artifact around formula image and formula image, although it should be. The standard errors estimated by bootstrap are small and are not plotted.
Figure 10
Figure 10. Shaping the task or the reward reduces the delayed learning effect.
A. Learning curves for two targets in opposite directions. The task is shaped by reducing the target size. Parameters: formula image; formula image; formula image. The running averages of the reward were monitored for the two targets separately. When both averages reached a steady state the target size was decreased by formula image. The error was sampled every 3 trials. B. Adaptation with a smooth reward function, Eq. (1). Top: formula image. Middle: formula image. Bottom: formula image. Parameters: formula image; formula image. The error was sampled every 10 trials.
Figure 11
Figure 11. Adaptation to multiple targets.
A. Average total number of target presentations required to learn the entire task vs. the number of presented targets, m. The targets are evenly distributed (between formula image to formula image). Black: formula image. Blue: formula image. Purple: formula image. Green: formula image. Dashed black line corresponds to learning the targets independently from the p.d.f. of formula image, which was estimated from adapting to one target. B. Examples of the noiseless error during the learning, plotted vs. the number of rewarded trials. The target direction is color coded. Dashed gray lines: The initial noiseless error for formula image. B.1 and B.2 are examples of the noiseless error for narrow tuning curves (formula image) in the case of 3 and 6 targets respectively. The plateau in the noiseless errors indicates that there is no interference between the targets. B.3 and B.4 are examples of the noiseless error for wider tuning curves (formula image) in the case of 3 and 6 targets respectively. The increase in the noiseless error above the initial error for some of the targets is the result of the destructive interference between far targets. C. The fraction of ordered realizations when formula image as function of formula image. Chance level is formula image. An ordered realization is defined as learning the targets in a close-to-far order, as in the example in B.4. The statistics were calculated over formula image realizations. For all the results presented in this figure: formula image.
Figure 12
Figure 12. Generalization error () and performance when adapting to multiple targets.
A. The generalization error vs. the location of the test targets, estimated from simulations as in Figure 6. Shaded area represents one formula image around the averages. Tuning width: formula image. B. The noiseless performance (see Eq(25)), averaged over all the tested targets (formula image) is plotted vs. the number of trained targets. See Materials and Methods for details about how this quantity was estimated. Blue: formula image. Green: formula image. Black: formula image. Dashed gray: zero formula image Parameters: formula image.

Similar articles

Cited by

References

    1. Pouget A, Snyder L (2000) Computational approaches to sensorimotor transformations. Nature Neuroscience 3: 1192–1198. - PubMed
    1. Piaget J, Cook M (1953) The origin of intelligence in the child. London: Routledge & Kegan Paul.
    1. Krakauer J, Pine Z, Ghilardi M, Ghez C (2000) Learning of visuomotor transformations for vectorial planning of reaching trajectories. The Journal of Neuroscience 20: 8916–8924. - PMC - PubMed
    1. Thoroughman K, Shadmehr R (2000) Learning of action through adaptive combination of motor primitives. Nature 407: 742–747. - PMC - PubMed
    1. Chou I, Lisberger S, et al.. (2002) Spatial generalization of learning in smooth pursuit eye movements: implications for the coordinate frame and sites of learning. The Journal of Neuroscience 22: 4728–4739. - PMC - PubMed

Publication types

Grants and funding

This work was carried out within the framework of the France-Israel Laboratory of Neuroscience (LEA-FILNe) and supported by a grant of the France-Israel High Council for Scientific and Technological cooperation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.