Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug 1;62(1):154-66.
doi: 10.1016/j.neuroimage.2012.04.024. Epub 2012 Apr 21.

Go and no-go learning in reward and punishment: interactions between affect and effect

Affiliations

Go and no-go learning in reward and punishment: interactions between affect and effect

Marc Guitart-Masip et al. Neuroimage. .

Abstract

Decision-making invokes two fundamental axes of control: affect or valence, spanning reward and punishment, and effect or action, spanning invigoration and inhibition. We studied the acquisition of instrumental responding in healthy human volunteers in a task in which we orthogonalized action requirements and outcome valence. Subjects were much more successful in learning active choices in rewarded conditions, and passive choices in punished conditions. Using computational reinforcement-learning models, we teased apart contributions from putatively instrumental and Pavlovian components in the generation of the observed asymmetry during learning. Moreover, using model-based fMRI, we showed that BOLD signals in striatum and substantia nigra/ventral tegmental area (SN/VTA) correlated with instrumentally learnt action values, but with opposite signs for go and no-go choices. Finally, we showed that successful instrumental learning depends on engagement of bilateral inferior frontal gyrus. Our behavioral and computational data showed that instrumental learning is contingent on overcoming inherent and plastic Pavlovian biases, while our neuronal data showed this learning is linked to unique patterns of brain activity in regions implicated in action and inhibition respectively.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Experimental paradigm. On each trial one of four possible fractal images indicated the combination between action (making a button press in go trials or withholding a button press in no-go trials) and valence at outcome (win or lose). Actions were required in response to a circle that followed the fractal image after a variable delay. On go trials, subjects indicated via a button press on which side of the screen the circle appeared. On no-go trials they withheld a response. After a brief delay, the outcome was presented: a green upward arrow indicated a win of £1, and a red downward arrow a loss of £1. A horizontal bar indicated of the absence of a win or a loss. On go to win trials a correct button press was rewarded, on go to avoid losing trials a correct button press avoided punishment, in no-go to win trials a correct withholding a button press led to reward, and in no-go to avoid losing trials a correct withholding a button press avoided punishment. The schematics at the bottom represent for each trial type, the nomenclature to the left, the possible outcomes and their probability after a correct response to the target (go choice) in the middle, and the possible outcomes and their probability after withholding a response to the target (no-go choice) in the right.
Fig. 2
Fig. 2
Observed and modeled behavioral performance. (A–D) Learning time courses for all four conditions. Each row of the raster images shows the choices of one of the 47 subjects in each of the four conditions. Go responses are depicted in white and no-go responses are depicted in grey. The overlaid black lines depict the time varying probabilities, across subjects, of making a go response. The colored lines show the same time-varying probabilities, but evaluated on choices sampled from the model (see Materials and methods). (E) Mean percentage of correct responses in each of the four conditions. Green error bars depict the 95% confidence interval (CI) and the red error bars depict standard error of the mean (SEM). Post hoc comparisons were implemented by means of repeated measures t-test: *p < 0.005. (F) Integrated Bayesian Information Criterion (BIC) score for all models tested. All models are modified Q-learning model with two pairs of action-values (go and no-go) for each state (fractal image). The winning model includes as free parameters a learning rate, a slope of the softmax rule, irreducible noise, a constant bias factor added to the action-value for go, and a Pavlovian factor that adds a fraction of the current state value to the action-value for go.
Fig. 3
Fig. 3
Action value representation in the striatum and SN/VTA. (A–B) The striatum (A) and the SN/VTA (B) show higher representation of Qgo when compared to Qno-go (p < 0.001 uncorrected; p < 0.05 SVC). The color scale indicates T values. (C–D) Parameter estimates of the four parametric regressors at the peak coordinate in the left putamen (C) and SN/VTA (D) showing that BOLD signal increased as the value of the go choice (Q go) increased both in the win and lose trials. On the other hand, BOLD signal decreased as the value of the no-go choice (Q no-go) increased, (note these parameter estimates were not used for statistical inference).
Fig. 4
Fig. 4
Action anticipation in learners and comparison to non-learners. (A) In learners, stimuli indicating go trials elicited greater left lateral substantia nigra/ventral tegmental area (SN/VTA) activity than stimuli indicating no-go trials (p < 0.001 uncorrected; p < 0.05 SVC). The color scale indicates T values. (B) Parameter estimates at the peak coordinates in the left lateral SN/VTA show activation at this location signals anticipation of action regardless of outcome valence (reward or punishment avoidance). Coordinates are given in MNI space. Error bars indicate SEM (note that these parameter estimates were not used for statistical inference). (C) In an independent comparison, left lateral SN/VTA distinguishes learners from non-learners in the magnitude of the contrast go versus no-go (p < 0.001 uncorrected; p = 0.05 SVC). The color scale indicates T values. (D) Parameter estimates at the peak coordinates in the left lateral SN/VTA show that only in subjects that learned, the task fractal images indicating go trials elicited higher BOLD activity than fractal images indicating no-go trials. Coordinates are given in MNI space. Error bars indicate SEM (note that these parameter estimates were not used for statistical inference).
Fig. 5
Fig. 5
Inhibition anticipation in learners and comparison to non-learners. (A) In learners, stimuli indicating no-go trials elicited greater bilateral inferior frontal gyrus (IFG) activity than stimuli indicating go trials (p < 0.001 uncorrected; p < 0.05 SVC). The color scale indicates T values. (B) Parameter estimates at the peak coordinates in both IFG clusters show that activation at these locations signals a requirement for inhibition regardless of the trial outcome valence (reward or punishment avoidance). Coordinates are given in MNI space. Error bars indicate SEM (note that these parameter estimates were not used for statistical inference). (C) In an independent comparison, bilateral IFG distinguishes learners from non-learners in the magnitude of the contrast no-go versus go (p < 0.001 uncorrected, p < 0.05 SVC). The color scale indicates T values. (D) Parameter estimates at the peak coordinates in the clusters depicted in C show that only in subjects that learned the task, fractal images indicating no-go trials elicited higher BOLD activity than fractal images indicating go trials. Coordinates are given in MNI space. Error bars indicate SEM (note that these parameter estimates were not used for statistical inference).

Similar articles

Cited by

References

    1. Aron A.R., Poldrack R.A. Cortical and subcortical contributions to stop signal response inhibition: role of the subthalamic nucleus. J. Neurosci. 2006;26:2424–2433. - PMC - PubMed
    1. Atallah H.E., Lopez-Paniagua D., Rudy J.W., O'Reilly R.C. Separate neural substrates for skill learning and performance in the ventral and dorsal striatum. Nat. Neurosci. 2007;10:126–131. - PubMed
    1. Balleine B.W., Liljeholm M., Ostlund S.B. The integrative function of the basal ganglia in instrumental conditioning. Behav. Brain Res. 2009;199:43–52. - PubMed
    1. Bayer H.M., Glimcher P.W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. - PMC - PubMed
    1. Bayer H.M., Lau B., Glimcher P.W. Statistics of midbrain dopamine neuron spike trains in the awake primate. J. Neurophysiol. 2007;98:1428–1439. - PubMed

Publication types