Interference and shaping in sensorimotor adaptations with rewards
- PMID: 24415925
- PMCID: PMC3886885
- DOI: 10.1371/journal.pcbi.1003377
Interference and shaping in sensorimotor adaptations with rewards
Abstract
When a perturbation is applied in a sensorimotor transformation task, subjects can adapt and maintain performance by either relying on sensory feedback, or, in the absence of such feedback, on information provided by rewards. For example, in a classical rotation task where movement endpoints must be rotated to reach a fixed target, human subjects can successfully adapt their reaching movements solely on the basis of binary rewards, although this proves much more difficult than with visual feedback. Here, we investigate such a reward-driven sensorimotor adaptation process in a minimal computational model of the task. The key assumption of the model is that synaptic plasticity is gated by the reward. We study how the learning dynamics depend on the target size, the movement variability, the rotation angle and the number of targets. We show that when the movement is perturbed for multiple targets, the adaptation process for the different targets can interfere destructively or constructively depending on the similarities between the sensory stimuli (the targets) and the overlap in their neuronal representations. Destructive interferences can result in a drastic slowdown of the adaptation. As a result of interference, the time to adapt varies non-linearly with the number of targets. Our analysis shows that these interferences are weaker if the reward varies smoothly with the subject's performance instead of being binary. We demonstrate how shaping the reward or shaping the task can accelerate the adaptation dramatically by reducing the destructive interferences. We argue that experimentally investigating the dynamics of reward-driven sensorimotor adaptation for more than one sensory stimulus can shed light on the underlying learning rules.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
appears on the screen at direction
(here
) to instruct the subject where to move the cursor. 2) The subject moves the cursor, which is invisible to him, toward the target (blue arrow). The only information available to the subject on his performance is the reward, delivered only if the cursor falls within the target. 3) A perturbation is introduced: the cursor is rotated by an angle γ with respect to the direction of the subject's hand movement (black arrow). 4) A learning phase follows where the subject progressively adapts to the perturbation, reducing the distance between the cursor endpoint and the target. B. Schematic description of the model. When the target appears, the activity profile of the input layer (red neurons) peaks around the target direction. The parameter ρ controls the width of the activity profile. The connectivity matrix between the input and the output (blue neurons) layers is denoted by
. A Gaussian noise with zero mean and a standard deviation of
is added to the output layer of the network. The two-dimensional output vector rotated by the matrix
represents the cursor endpoint. A reward is delivered if the distance between the cursor endpoint and the center of the target is smaller than
. The connectivity matrix
is then changed according to a reward-modulated plasticity rule (see Eq(8)).
. Left: the error is calculated as the squared distance between the cursor endpoint and the target (see Eq. (5)) and plotted as a function of the trial number. The rotation perturbation is applied on trials following t = 0. For display purposes, only one in four trials is displayed. The solid line represents the error, smoothed with a 100 trials sliding median window. Final error of
(mean± SE, computed as explained in Materials and Methods). Dashed purple line: Target size. Right: as in left, but only the directional part of the error is plotted against the trial number. The shaded area corresponds to the target size. B. Same as in the left panel of A. but with
and corresponding final error of
. C. Same as in the left panel of A. but with
and a corresponding final error of
. D. Probability density function (p.d.f.) of the logarithm of the learning duration. The learning duration (
) is defined as the number of trials it takes to learn the task (see: Materials and Methods). Target size is
. E. Trade-off between learning duration and final error. Average of
distribution (green) and the final error (blue) are plotted against the target size. The shaded area around the averages corresponds to half SD of the distributions. Solid lines:
. Dashed lines:
. F. The probability of getting the first reward,
(see Eq. (10)), vs. the noise level,
for two values of the target size. In all the panels:
.
(purple dashed line),
and a normalized learning rate (
, see Eq. (12)) of 0.3. For display purposes, only one in four trials is displayed. B. The performance (blue), i.e., the probability that
and the noiseless performance (red), i.e., the probability that
are plotted against the normalized learning rate. These quantities were estimated from simulations of
trials, while excluding the transient learning phase. Note that for
the noiseless performance is perfect. The standard error of the mean is too small to notice. C. Distribution of the noiseless error,
, at the end of the learning phase. For
, the support of the distribution is bounded by
. For
, the distribution is uniform for
and zero otherwise. For
the support of the distribution is bounded but extends beyond
. In B and C:
;
.
;
;
;
. Blue: The error is sampled every 3 trials (dots) and smoothed with a 50 trials median sliding window (line) vs. the number of trials. Purple: The size of the target. B. Reach angle (in degrees) as a function of the trial number when the rotation angle is progressively increased (see Results). The target size is fixed:
. At
,
. The rotation angle is increased by
every 25 trials up to
. The shaded area corresponds to the target size (
around the target center). Inset: the network is unable to follow the gradual rotation for a different realization of the noise with the same parameters. In both panels:
.
. Top: learning curve for a reward function that changes abruptly around target size (
). Bottom, main panel: learning curve for a gradual reward function (
). Note the change in the abscissa scale. Inset: The reward function vs. the error. The target size is dashed purple line. B. The learning duration and the performance vs. the smoothing parameter, T. Solid lines: Deterministic smooth reward function as in A. Dashed lines: Stochastic binary reward delivered with a probability that depends on
(see Results). In A and B:
;
.
is plotted as a function of the angle of the test target after adaptation to a target in direction
. Perfect generalization is when
. Lines: Analytical result for
(see Eq.(19)). Circles: Simulation results for
. For clarity, the results are displayed for test targets sampled every 15 degrees. The generalization error was averaged over 200 realizations of the noise. Shaded area represents one
around the averages. Gray line: zero
. The mapping between
and the half-bandwidth,
, is given in Eq. (3). For instance,
corresponds to
and
to
. Parameters:
;
.
. Middle:
. Bottom panel:
. B. Distribution of learning duration for two opposite targets for different noise levels. Solid lines: The probability density functions of
(blue) and
(green) for the two targets (solid lines) where
(resp.
) is the learning duration for the target that is learned first (resp. second). Dashed lines: Distributions of
and
assuming that
and
are independent random variables. The distributions were estimated over
realizations of the noise. Simulations were long enough for the network to eventually adapt to both targets. Top:
. Bottom:
. C. The average and the SD of the distributions of
(blue) and
(green) vs. the noise level. D. The distribution of the ratio
for the two noise level values in B.
, panel 3). This update moves the rotated output away from the target in the opposite direction since the vector
is away from it. This results in an increase in the error, referred to as destructive interference. The probability of a rewarded trial for this target is now substantially reduced, delaying learning for that target. A similar effect occurs when the two targets are sufficiently far apart. However, when they are close (panel 5) the interference becomes constructive, since after the update of the matrix, the rotated output gets closer to both targets. Note that the overlap,
, depends on the width of the tuning curves (see Materials and Methods).
, characterizes the strength and the nature of the interference during learning of the rotation task for two targets. A.
for different values of the angular distance between the targets. The interference becomes constructive when
decreases. B. The extremum of
over t,
, plotted against
for different values of
. Purple:
. Blue:
. Green:
. The width of the curve was chosen to correspond to the
of
, estimated by bootstrap. Note the slight non-monotonicity for
. Inset:
for
,
,
for
(same color code as for the dots on the main figure in this panel). Parameters:
. C–F
is plotted for different values of
(C),
(D),
(E) and
(F). In all these figures,
was calculated over
repetitions. The result was low-pass filtered to suppress fast trial-to-trial fluctuations for the sake of clarity. Consequently, there is a causality artifact around
and
, although it should be. The standard errors estimated by bootstrap are small and are not plotted.
;
;
. The running averages of the reward were monitored for the two targets separately. When both averages reached a steady state the target size was decreased by
. The error was sampled every 3 trials. B. Adaptation with a smooth reward function, Eq. (1). Top:
. Middle:
. Bottom:
. Parameters:
;
. The error was sampled every 10 trials.
to
). Black:
. Blue:
. Purple:
. Green:
. Dashed black line corresponds to learning the targets independently from the p.d.f. of
, which was estimated from adapting to one target. B. Examples of the noiseless error during the learning, plotted vs. the number of rewarded trials. The target direction is color coded. Dashed gray lines: The initial noiseless error for
. B.1 and B.2 are examples of the noiseless error for narrow tuning curves (
) in the case of 3 and 6 targets respectively. The plateau in the noiseless errors indicates that there is no interference between the targets. B.3 and B.4 are examples of the noiseless error for wider tuning curves (
) in the case of 3 and 6 targets respectively. The increase in the noiseless error above the initial error for some of the targets is the result of the destructive interference between far targets. C. The fraction of ordered realizations when
as function of
. Chance level is
. An ordered realization is defined as learning the targets in a close-to-far order, as in the example in B.4. The statistics were calculated over
realizations. For all the results presented in this figure:
.
around the averages. Tuning width:
. B. The noiseless performance (see Eq(25)), averaged over all the tested targets (
) is plotted vs. the number of trained targets. See Materials and Methods for details about how this quantity was estimated. Blue:
. Green:
. Black:
. Dashed gray: zero
Parameters:
.Similar articles
-
Task errors contribute to implicit aftereffects in sensorimotor adaptation.Eur J Neurosci. 2018 Dec;48(11):3397-3409. doi: 10.1111/ejn.14213. Epub 2018 Nov 9. Eur J Neurosci. 2018. PMID: 30339299 Clinical Trial.
-
Explicit learning based on reward prediction error facilitates agile motor adaptations.PLoS One. 2023 Dec 6;18(12):e0295274. doi: 10.1371/journal.pone.0295274. eCollection 2023. PLoS One. 2023. PMID: 38055714 Free PMC article.
-
Neural signatures of reward and sensory error feedback processing in motor learning.J Neurophysiol. 2019 Apr 1;121(4):1561-1574. doi: 10.1152/jn.00792.2018. Epub 2019 Feb 27. J Neurophysiol. 2019. PMID: 30811259 Free PMC article.
-
Human sensorimotor learning: adaptation, skill, and beyond.Curr Opin Neurobiol. 2011 Aug;21(4):636-44. doi: 10.1016/j.conb.2011.06.012. Epub 2011 Jul 20. Curr Opin Neurobiol. 2011. PMID: 21764294 Review.
-
Computational mechanisms of sensorimotor control.Neuron. 2011 Nov 3;72(3):425-42. doi: 10.1016/j.neuron.2011.10.006. Neuron. 2011. PMID: 22078503 Review.
Cited by
-
Vocal generalization depends on gesture identity and sequence.J Neurosci. 2014 Apr 16;34(16):5564-74. doi: 10.1523/JNEUROSCI.5169-13.2014. J Neurosci. 2014. PMID: 24741046 Free PMC article.
-
Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning.J Neurosci. 2016 Nov 16;36(46):11682-11692. doi: 10.1523/JNEUROSCI.1767-16.2016. J Neurosci. 2016. PMID: 27852776 Free PMC article.
-
Learning to Produce Syllabic Speech Sounds via Reward-Modulated Neural Plasticity.PLoS One. 2016 Jan 25;11(1):e0145096. doi: 10.1371/journal.pone.0145096. eCollection 2016. PLoS One. 2016. PMID: 26808148 Free PMC article.
-
Formation of model-free motor memories during motor adaptation depends on perturbation schedule.J Neurophysiol. 2015 Apr 1;113(7):2733-41. doi: 10.1152/jn.00673.2014. Epub 2015 Feb 11. J Neurophysiol. 2015. PMID: 25673736 Free PMC article.
-
Striatal action-value neurons reconsidered.Elife. 2018 May 31;7:e34248. doi: 10.7554/eLife.34248. Elife. 2018. PMID: 29848442 Free PMC article.
References
-
- Pouget A, Snyder L (2000) Computational approaches to sensorimotor transformations. Nature Neuroscience 3: 1192–1198. - PubMed
-
- Piaget J, Cook M (1953) The origin of intelligence in the child. London: Routledge & Kegan Paul.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
