Primate Orbitofrontal Cortex Codes Information Relevant for Managing Explore-Exploit Tradeoffs
- PMID: 32060169
- PMCID: PMC7083541
- DOI: 10.1523/JNEUROSCI.2355-19.2020
Primate Orbitofrontal Cortex Codes Information Relevant for Managing Explore-Exploit Tradeoffs
Erratum in
-
Erratum: Costa et al., "Primate Orbitofrontal Cortex Codes Information Relevant for Managing Explore-Exploit Tradeoffs".J Neurosci. 2020 Jul 29;40(31):6098. doi: 10.1523/JNEUROSCI.1539-20.2020. Epub 2020 Jul 21. J Neurosci. 2020. PMID: 32719162 Free PMC article. No abstract available.
Abstract
Reinforcement learning (RL) refers to the behavioral process of learning to obtain reward and avoid punishment. An important component of RL is managing explore-exploit tradeoffs, which refers to the problem of choosing between exploiting options with known values and exploring unfamiliar options. We examined correlates of this tradeoff, as well as other RL related variables, in orbitofrontal cortex (OFC) while three male monkeys performed a three-armed bandit learning task. During the task, novel choice options periodically replaced familiar options. The values of the novel options were unknown, and the monkeys had to explore them to see if they were better than other currently available options. The identity of the chosen stimulus and the reward outcome were strongly encoded in the responses of single OFC neurons. These two variables define the states and state transitions in our model that are relevant to decision-making. The chosen value of the option and the relative value of exploring that option were encoded at intermediate levels. We also found that OFC value coding was stimulus specific, as opposed to coding value independent of the identity of the option. The location of the option and the value of the current environment were encoded at low levels. Therefore, we found encoding of the variables relevant to learning and managing explore-exploit tradeoffs in OFC. These results are consistent with findings in the ventral striatum and amygdala and show that this monosynaptically connected network plays an important role in learning based on the immediate and future consequences of choices.SIGNIFICANCE STATEMENT Orbitofrontal cortex (OFC) has been implicated in representing the expected values of choices. Here we extend these results and show that OFC also encodes information relevant to managing explore-exploit tradeoffs. Specifically, OFC encodes an exploration bonus, which characterizes the relative value of exploring novel choice options. OFC also strongly encodes the identity of the chosen stimulus, and reward outcomes, which are necessary for computing the value of novel and familiar options.
Keywords: decision-making; explore–exploit; monkey; orbitofrontal cortex; reinforcement learning.
Copyright © 2020 the authors.
Figures
Similar articles
-
Differential coding of goals and actions in ventral and dorsal corticostriatal circuits during goal-directed behavior.Cell Rep. 2022 Jan 4;38(1):110198. doi: 10.1016/j.celrep.2021.110198. Cell Rep. 2022. PMID: 34986350 Free PMC article.
-
Amygdala Contributions to Stimulus-Reward Encoding in the Macaque Medial and Orbital Frontal Cortex during Learning.J Neurosci. 2017 Feb 22;37(8):2186-2202. doi: 10.1523/JNEUROSCI.0933-16.2017. Epub 2017 Jan 25. J Neurosci. 2017. PMID: 28123082 Free PMC article.
-
The neurocomputational bases of explore-exploit decision-making.Neuron. 2022 Jun 1;110(11):1869-1879.e5. doi: 10.1016/j.neuron.2022.03.014. Epub 2022 Apr 6. Neuron. 2022. PMID: 35390278 Free PMC article.
-
Specializations for reward-guided decision-making in the primate ventral prefrontal cortex.Nat Rev Neurosci. 2018 Jul;19(7):404-417. doi: 10.1038/s41583-018-0013-4. Nat Rev Neurosci. 2018. PMID: 29795133 Free PMC article. Review.
-
Functional Heterogeneity within Rat Orbitofrontal Cortex in Reward Learning and Decision Making.J Neurosci. 2017 Nov 1;37(44):10529-10540. doi: 10.1523/JNEUROSCI.1678-17.2017. J Neurosci. 2017. PMID: 29093055 Free PMC article. Review.
Cited by
-
Value representations in the rodent orbitofrontal cortex drive learning, not choice.Elife. 2022 Aug 17;11:e64575. doi: 10.7554/eLife.64575. Elife. 2022. PMID: 35975792 Free PMC article.
-
Differential coding of goals and actions in ventral and dorsal corticostriatal circuits during goal-directed behavior.Cell Rep. 2022 Jan 4;38(1):110198. doi: 10.1016/j.celrep.2021.110198. Cell Rep. 2022. PMID: 34986350 Free PMC article.
-
Pupil size predicts the onset of exploration in brain and behavior.bioRxiv [Preprint]. 2023 May 24:2023.05.24.541981. doi: 10.1101/2023.05.24.541981. bioRxiv. 2023. PMID: 37292773 Free PMC article. Preprint.
-
The orbitofrontal cartographer.Behav Neurosci. 2021 Apr;135(2):267-276. doi: 10.1037/bne0000463. Behav Neurosci. 2021. PMID: 34060879 Free PMC article. Review.
-
Reinforcement-learning in fronto-striatal circuits.Neuropsychopharmacology. 2022 Jan;47(1):147-162. doi: 10.1038/s41386-021-01108-0. Epub 2021 Aug 5. Neuropsychopharmacology. 2022. PMID: 34354249 Free PMC article. Review.
References
-
- Amaral DG, Price JL, Pitkanen A, Carmichael ST (1992) Anatomical organization of the primate amygdaloid complex. In: The amygdala: neurobiological aspects of emotion, memory, and mental dysfunction (Aggleton JP, ed), pp. 1–66. New York: Wiley.
-
- Averbeck BB. (2017) Amygdala and ventral striatum population codes implement multiple learning rates for reinforcement learning. 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, 2017, pp. 1–5. 10.1109/SSCI.2017.8285354 - DOI
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources