Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Aug;104(2):1068-76.
doi: 10.1152/jn.00158.2010. Epub 2010 Jun 10.

A Pallidus-Habenula-Dopamine Pathway Signals Inferred Stimulus Values

Affiliations
Free PMC article

A Pallidus-Habenula-Dopamine Pathway Signals Inferred Stimulus Values

Ethan S Bromberg-Martin et al. J Neurophysiol. .
Free PMC article

Abstract

The reward value of a stimulus can be learned through two distinct mechanisms: reinforcement learning through repeated stimulus-reward pairings and abstract inference based on knowledge of the task at hand. The reinforcement mechanism is often identified with midbrain dopamine neurons. Here we show that a neural pathway controlling the dopamine system does not rely exclusively on either stimulus-reward pairings or abstract inference but instead uses a combination of the two. We trained monkeys to perform a reward-biased saccade task in which the reward values of two saccade targets were related in a systematic manner. Animals used each trial's reward outcome to learn the values of both targets: the target that had been presented and whose reward outcome had been experienced (experienced value) and the target that had not been presented but whose value could be inferred from the reward statistics of the task (inferred value). We then recorded from three populations of reward-coding neurons: substantia nigra dopamine neurons; a major input to dopamine neurons, the lateral habenula; and neurons that project to the lateral habenula, located in the globus pallidus. All three populations encoded both experienced values and inferred values. In some animals, neurons encoded experienced values more strongly than inferred values, and the animals showed behavioral evidence of learning faster from experience than from inference. Our data indicate that the pallidus-habenula-dopamine pathway signals reward values estimated through both experience and inference.

Figures

Fig. 1.
Fig. 1.
Reward-biased saccade task. A: task diagram. The monkey fixated a central spot for 1.2 s. The spot disappeared and simultaneously a visual target appeared on the left or right side of the screen. The monkey was required to saccade to the target. In 1 block of 24 trials, left saccades were rewarded and right saccades were unrewarded (block 1); in the next block, the reward values were reversed without notice to the animal (block 2). B: example sequence of events after a block change. In the 1st trial of the new block, the monkey receives an unexpected reward outcome (trial 1: right target, reward). The 2nd trial of the block could present the same target, whose new reward value had just been experienced (trial 2: same target, experienced value), or it could present the other target, which had been absent on the previous trial and whose new reward value had to be inferred based on the reversal rule of the task (trial 2: other target, inferred value). C: 2 ways to learn stimulus values from the pairing right target → reward. Left: if the animal learned through experience alone, the right target value would be increased but the left target value would remain unchanged. In trial 2, the animal would show no preference between the targets. Right: if the animal learned through inference, the animal would additionally infer that the block had changed to block 2, and hence the left target value had decreased. The animal's preference would switch from the left target to the right target.
Fig. 2.
Fig. 2.
Combination of experienced and inferred stimulus values in neural activity and behavior in monkeys E and L. The rows represent (A) lateral habenula neurons, (B) dopamine neurons, and (C) behavioral reaction times. First 3 columns: data for the 1st trial of the block (trial 1), for the 2nd trial of the block when the target was different from the 1st trial (trial 2, Other Target), and for the 2nd trial of the block when the target was the same as on the 1st trial (trial 2, Same Target). Data are shown separately for the target that was rewarded in the previous block and unrewarded in the current block (old R, new U, blue) and for the target that was unrewarded in the previous block and rewarded in the current block (old U, new R, red). Neural firing rates were smoothed with a Gaussian kernel (σ = 15 ms) and averaged over neurons. Shaded areas and error bars are ±SE. Gray bars along the time axis indicate the response window for calculation of reversal indexes. Note that each red or blue curve in A and B only includes data from neurons that had at least 1 trial in which the appropriate current-trial and past-trial targets were presented (n = 42–63 for each curve). Right 3 column: reversal index on the 2nd trial of the block, calculated using all data (1st column), using data from monkey L (2nd column), and using data from monkey E (3rd column). Reversal indexes were calculated separately for other-target trials when the value of the target had to be inferred (white bars, Inf) and for same-target trials when the value of the target had already been experienced on the 1st trial of the block (gray bars, Exp). Numbers at the bottom of each bar indicate the number of neural recording sessions for that bar. Symbols indicate statistical significance measured using a shuffling procedure (*P < 0.05; +P ≤ 0.06; ns P > 0.06). Error bars are ±SE. Neural and behavioral measures of stimulus values reversed on both trial types but reversed less fully on inferred value trials.
Fig. 3.
Fig. 3.
Neural responses to outcome delivery in monkeys E and L. The rows represent (A) lateral habenula neurons and (B) dopamine neurons. Same format as the left 3 columns of Fig. 2. Data are plotted from the same neurons and trials as in Fig. 2, A and B, but aligned on outcome delivery. Gray bars along the time axis indicate the time window for measuring the outcome response. On the 1st trial of each block when an unexpected outcome was delivered, lateral habenula and dopamine neurons had a strong outcome response (left column). On inferred-value trials, lateral habenula neurons had a tendency for a small residual outcome response (middle column).
Fig. 4.
Fig. 4.
Experienced and inferred stimulus values in neural activity and behavior in monkeys N and D. Same format as Fig. 2. The rows represent (A) GPiLHb-negative neurons, (B) lateral habenula multiunit activity, and (C) behavioral reaction times. Note that each red or blue curve in A and B only includes data from neurons that had at least 1 trial in which the appropriate current-trial and past-trial targets were presented (n = 24–37 for each curve). In monkey D, neural and behavioral measures of stimulus values reversed similarly on both experienced value and inferred value trials (right column).

Similar articles

See all similar articles

Cited by 71 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback