Adaptive properties of differential learning rates for positive and negative outcomes
- PMID: 24085507
- DOI: 10.1007/s00422-013-0571-5
Adaptive properties of differential learning rates for positive and negative outcomes
Abstract
The concept of the reward prediction error-the difference between reward obtained and reward predicted-continues to be a focal point for much theoretical and experimental work in psychology, cognitive science, and neuroscience. Models that rely on reward prediction errors typically assume a single learning rate for positive and negative prediction errors. However, behavioral data indicate that better-than-expected and worse-than-expected outcomes often do not have symmetric impacts on learning and decision-making. Furthermore, distinct circuits within cortico-striatal loops appear to support learning from positive and negative prediction errors, respectively. Such differential learning rates would be expected to lead to biased reward predictions and therefore suboptimal choice performance. Contrary to this intuition, we show that on static "bandit" choice tasks, differential learning rates can be adaptive. This occurs because asymmetric learning enables a better separation of learned reward probabilities. We show analytically how the optimal learning rate asymmetry depends on the reward distribution and implement a biologically plausible algorithm that adapts the balance of positive and negative learning rates from experience. These results suggest specific adaptive advantages for separate, differential learning rates in simple reinforcement learning settings and provide a novel, normative perspective on the interpretation of associated neural data.
Similar articles
-
Do learning rates adapt to the distribution of rewards?Psychon Bull Rev. 2015 Oct;22(5):1320-7. doi: 10.3758/s13423-014-0790-3. Psychon Bull Rev. 2015. PMID: 25582684
-
Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning.Neural Netw. 2006 Oct;19(8):1242-54. doi: 10.1016/j.neunet.2006.06.007. Epub 2006 Sep 20. Neural Netw. 2006. PMID: 16987637
-
Neural correlates of risk prediction error during reinforcement learning in humans.Neuroimage. 2009 Oct 1;47(4):1929-39. doi: 10.1016/j.neuroimage.2009.04.096. Epub 2009 May 13. Neuroimage. 2009. PMID: 19442744
-
Mechanisms of reinforcement learning and decision making in the primate dorsolateral prefrontal cortex.Ann N Y Acad Sci. 2007 May;1104:108-22. doi: 10.1196/annals.1390.007. Epub 2007 Mar 8. Ann N Y Acad Sci. 2007. PMID: 17347332 Review.
-
Reward-dependent learning in neuronal networks for planning and decision making.Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0. Prog Brain Res. 2000. PMID: 11105649 Review.
Cited by
-
The interpretation of computational model parameters depends on the context.Elife. 2022 Nov 4;11:e75474. doi: 10.7554/eLife.75474. Elife. 2022. PMID: 36331872 Free PMC article.
-
Valence biases in reinforcement learning shift across adolescence and modulate subsequent memory.Elife. 2022 Jan 24;11:e64620. doi: 10.7554/eLife.64620. Elife. 2022. PMID: 35072624 Free PMC article.
-
A Competition of Critics in Human Decision-Making.Comput Psychiatr. 2021 Aug 12;5(1):81-101. doi: 10.5334/cpsy.64. eCollection 2021. Comput Psychiatr. 2021. PMID: 38773993 Free PMC article.
-
Neural correlates of proactive avoidance deficits and alcohol use motives in problem drinking.Transl Psychiatry. 2024 Aug 21;14(1):336. doi: 10.1038/s41398-024-03039-y. Transl Psychiatry. 2024. PMID: 39168986 Free PMC article.
-
Three heads are better than two: Comparing learning properties and performances across individuals, dyads, and triads through a computational approach.PLoS One. 2021 Jun 17;16(6):e0252122. doi: 10.1371/journal.pone.0252122. eCollection 2021. PLoS One. 2021. PMID: 34138907 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
