Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Aug 6;83(3):551-7.
doi: 10.1016/j.neuron.2014.06.035. Epub 2014 Jul 24.

A reinforcement learning mechanism responsible for the valuation of free choice

Affiliations

A reinforcement learning mechanism responsible for the valuation of free choice

Jeffrey Cockburn et al. Neuron. .

Abstract

Humans exhibit a preference for options they have freely chosen over equally valued options they have not; however, the neural mechanism that drives this bias and its functional significance have yet to be identified. Here, we propose a model in which choice biases arise due to amplified positive reward prediction errors associated with free choice. Using a novel variant of a probabilistic learning task, we show that choice biases are selective to options that are predominantly associated with positive outcomes. A polymorphism in DARPP-32, a gene linked to dopaminergic striatal plasticity and individual differences in reinforcement learning, was found to predict the effect of choice as a function of value. We propose that these choice biases are the behavioral byproduct of a credit assignment mechanism responsible for ensuring the effective delivery of dopaminergic reinforcement learning signals broadcast to the striatum.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Experimental task design
(A): Example free-choice (fc) and no-choice (nc) stimuli used in the task with associated reward probabilities shown. (B) Training phase: One stimulus pair is presented per trial. Participants are asked to select one of the two available options. Participants were alerted to the free-choice (Choose) or no-choice (Match) condition prior to stimulus presentation. On free-choice trials, participants were free to choose either option, but on no-choice trials, participants were forced to select the framed stimulus. Probabilistic feedback followed option selection. (C) Test phase: Participants were repeatedly asked to choose the best option among all possible option pairings. Participants were free to choose either stimulus on all trials, but no feedback was provided. Choice bias was quantified according to performance on trials where equally rewarded free-choice and no-choice options were paired.
Figure 2
Figure 2. Positive RPE amplification mechanism and choice bias patterns
(A) A simplified diagram of the BG/SNc feedback circuitry. Sensory and motor information is projected to the BG via cortico-striatal projections, where it is channeled through both the direct Go (green circles) and indirect NoGo (red circles) pathways, providing positive and negative evidence for each action, respectively, before converging at the substantia nigra pars reticulata (SNr). The activity pattern depicted here illustrates a case of balanced Go activity for two candidate actions, but differential NoGo activity, leading to gating of the right-most action. (vertical bar indicates the gated action) to the thalamus. The same disinhibitory mechanism that gates thalamocortical actions also disinhibit SNc dopaminergic signals via SNr-SNc projections, thereby allowing reinforcement signals to be amplified when the BG gates an action. The degree of free choice amplification due to this mechanism is captured by αfc+. (B) Model generated choice bias for a range of αfc+ values as a function of reward contingency, computed as the percentage of trials where the free-choice (fc) option was selected. (C) Participant preferences on choice bias trials as a function of reward contingency, calculated as the percentage of choice bias trials where the free-choice (fc) option was selected. Error bars indicate standard error of the mean.
Figure 3
Figure 3. Derived value structure and implied preference patterns
(A) The option value structure derived from the empirically quantified choice bias. No-choice options (nc) take on true expected values. Free-choice options (fc) take on the true expected values adjusted according to the choice bias for each option. (B) Percent correct (choice of more rewarding option) across trials involving Afc, or Anc. (C) Percent correct across trials involving Bfc or Bnc. All error bars represent standard error of the mean.
Figure 4
Figure 4. Effects of positive RPE amplification on actor weights, and its interaction with learning asymmetries
(A) The effect of amplified positive RPEs on Go (Qg) and NoGo (Qng) weights. Go weights for the most rewarding options are preferentially amplified, increasing the model's propensity to select those options in accordance with the degree of amplification (Afc > Cfc > Efc). NoGo weights for the least rewarding options are preferentially dampened, decreasing the model's propensity to avoid those options in accordance with the degree of dampening (Afc < Cfc < Efc). (B) The interaction between αfc+ and the αgn asymmetry. (C) Choice-bias according to DARPP-32 gene groups (C or TT) as a function of expected value. Bars represent behavioral data, and points represent options preferences recovered from the best fitting model. Error bars indicate standard error of the mean.

Comment in

Similar articles

Cited by

References

    1. Alexander G, Crutcher M. Functional architecture of basal ganglia circuits: neural substrates of parallel processing. Trends Neurosci. 1990;13:266–271. - PubMed
    1. Ashby FG, Ennis JM, Spiering BJ. A neurobiological theory of automaticity in perceptual categorization. Psychol. Rev. 2007;114:632–656. - PubMed
    1. Bown NJ, Read D, Summers B. The lure of choice. J. Behav. Decis. Mak. 2003;16:297–308.
    1. Brown P, Marsden C. What do the basal ganglia do? Lancet. 1998;351:1801–1804. - PubMed
    1. Collins AGE, Frank MJ. Modeling interactive learning and incentive choice effects of striatal dopamine. Psychol. Rev - PubMed

LinkOut - more resources