Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Aug;22(8):472-487.
doi: 10.1038/s41583-021-00479-z. Epub 2021 Jul 6.

Navigating for reward

Affiliations
Review

Navigating for reward

Marielena Sosa et al. Nat Rev Neurosci. 2021 Aug.

Erratum in

Abstract

An organism's survival can depend on its ability to recall and navigate to spatial locations associated with rewards, such as food or a home. Accumulating research has revealed that computations of reward and its prediction occur on multiple levels across a complex set of interacting brain regions, including those that support memory and navigation. However, how the brain coordinates the encoding, recall and use of reward information to guide navigation remains incompletely understood. In this Review, we propose that the brain's classical navigation centres - the hippocampus and the entorhinal cortex - are ideally suited to coordinate this larger network by representing both physical and mental space as a series of states. These states may be linked to reward via neuromodulatory inputs to the hippocampus-entorhinal cortex system. Hippocampal outputs can then broadcast sequences of states to the rest of the brain to store reward associations or to facilitate decision-making, potentially engaging additional value signals downstream. This proposal is supported by recent advances in both experimental and theoretical neuroscience. By discussing the neural systems traditionally tied to navigation and reward at their intersection, we aim to offer an integrated framework for understanding navigation to reward as a fundamental feature of many cognitive processes.

PubMed Disclaimer

Figures

Fig. 1 ∣
Fig. 1 ∣. Modulations of hippocampal-entorhinal activity at reward-related behavioural timepoints.
a ∣ Timepoints and associated behaviours surrounding reward acquisition. The magenta star indicates the goal location throughout the figure. Warmer colors indicate higher firing rates except where noted. In example spike plots, grey points indicate positions of the animal; coloured points indicate spikes. b ∣ Example hippocampal cell firing pattern during goal-approach to the east reward well in a 2D environment. c ∣ Place-field clustering in CA1 near three goal locations (white dots). Left: Place maps for an example cell before learning (pre), at the end of learning and during a probe session (post). Right: density of population place field centres (scale indicates proportion of cells). Because this overrepresentation of goals is characterized as a change in the time-averaged hippocampal activity over the course of a session, its specificity to goal approach versus goal arrival is not clear. d ∣ Example CA1 or subiculum cells showing reward-specific firing (right) or place firing (left). Red lines indicate reward locations. ‘A’ and ‘B’ denote distinct virtual environments. Each plot shows mean calcium activity across trials. e ∣ Left: continuous T-maze task, in which a rodent must choose between left and right goals that have different probabilities of reward. Right: Example CA1 cell showing increased firing rate based on reward history at the right goal (R+) compared with unrewarded times (R−) and left goals (L−, L+). Top: Spike raster for each outcome. Middle: Total occupancy of each spatial bin. Bottom: Average firing rate for each outcome. f ∣ Goal approach activity in an example medial entorhinal cortex grid cell. Firing patterns in a 2D environment and continuous T-maze are shown. The cell exhibited higher firing rate on the centre stem on right choice trials. g ∣ Increase in grid cell firing rates near a hidden goal zone (red box) when food is delivered inside the zone (right) vs. during random foraging for scattered food (left). h ∣ Shifting grid cell fields toward three reward locations (black dots) (similar format to part c). Red circle highlights the field that moves the most across learning. Part b adapted from ref. [64], with permission. Part c adapted from ref. [32], with permission. Part d adapted from ref. [89], with permission. Part e adapted from ref. [71], with permission. Part f adapted from ref. [114], with permission. Part g adapted from ref. [117], with permission. Part h adapted from ref. [118], with permission.
Fig. 2 ∣
Fig. 2 ∣. Dopaminergic signalling and innervation of the hippocampus.
a ∣ Reward prediction error (RPE) signalling. Dopaminergic neurons of the ventral tegmental area (VTA), which typically maintain a tonic firing rate, fire phasically in response to unexpected reward (positive RPE). As the reward becomes more predictable over learning, firing decreases for reward and increases for the reward-predictive cue, scaling with the degree of expectation and the value predicted. After extended learning, firing is suppressed if the expected reward is omitted (negative RPE) and increased if reward is larger than expected. b ∣ Cartoon of value or motivation signalling in the nucleus accumbens (NAc), similar between dopamine concentration and VTA axon activity (putative time course based on refs ,,). The example task here involves a movement to initiate the trial, such as a nosepoke, followed by a reward-predictive cue just before reward delivery, such as a feeder click. Phasic and ramping signals before reward delivery scale with recent rate of reward, which approximates value and increases motivation to perform the task. Note that RPE signals layer on top of this value signal, but here the reward delivered is as expected. c ∣ Distribution of VTA and locus coeruleus (LC) axons in the hippocampus. Darker yellow shading indicates greater LC axon density in CA3. d ∣ Summarized effects of dopaminergic input inactivation or activation on four hippocampal place cells (coloured blobs). Left: LC or VTA axon inhibition (colours as in part c), or dopamine antagonism in hippocampus, destabilizes place fields in sequential exposures to the same square environment. Right: LC or VTA axon activation with optogenetics (shown as a blue light) promotes the shift of place fields toward a goal location (magenta star). SLM, stratum lacunosum moleculare, SO, stratum oriens; SR, stratum radiatum; Sub, subiculum.
Fig. 3 ∣
Fig. 3 ∣. Hippocampal theta sequences and replay.
a ∣ A rodent running on a linear environment engages theta sequences. An example theta trace (local field potential filtered at 5–11 Hz) is shown below the track. Place cells with overlapping fields spanning just behind to just ahead of the animal’s position spike sequentially within each theta cycle (spikes are shown as vertical ticks, theta cycles separated by dashed vertical lines). Early phases of theta (0 to pi radians) contain spikes corresponding to past and present, whereas late phases (pi to 2 pi) contain more spikes corresponding to future positions. b ∣ A ‘W-maze’ alternation task (for example as in refs ,,) illustrating right and left choices represented as single spikes of place cells (green and yellow fields) on alternating theta cycles. Note that spikes occur on the late phases of opposite theta cycles (same example theta trace as in part a). On the W-maze, the animal is rewarded for visiting the opposite side arm from the previously visited arm when coming from the centre. Thus theta alternation could act as a mode of deliberation, with retrieval of information relevant to future experience taking place in the second half of the cycle. c ∣ In periods of immobility such as during food consumption, sequences of places cells replay during sharp-wave ripples (SWRs). The same example SWR (local field potential filtered at 150–250 Hz) is shown to illustrate both forward and reverse replay events. d ∣ In the same W-maze task shown in part b, a rodent exhibits forward replay of both alternate trajectories while immobile, before beginning a run. Separate replay events (same SWR used for illustration purposes) are shown, displaying replay of leftward and rightward place cell sequences, putatively allowing the animal to evaluate possible future outcomes.
Fig. 4 ∣
Fig. 4 ∣. Hypothesized interactions between brain systems in navigating to reward.
a ∣ A sequence of hippocampal place fields interpreted as a sequence of 5 states (s1–s5) that discretize forward movement on a linear track, with expected reward in each state (r1–r5). b ∣ The successor representation (SR) matrix for the 5 states depicted in part a. Hypothetical transition probabilities arise from the assumption that the hippocampal representation is mostly unidirectional on the linear track (that is, states in this sequence predict past states with very low probability and future states with high probability that decays with increased distance). Purple arrows indicate the firing field for hippocampus (HPC) cell 1 (column 1) and its SR (row 1). c ∣ Left to right, first: The successor representation vector M(si,:) for all states given trajectories initiated in state i for i = [1:5] (rows of the SR matrix). Darker colours indicate higher predicted occupancy. Second: The firing rates of each hippocampal cell in 5 spatial bins (that is, the 5 states) derived from the columns of the SR matrix. Darker colours indicate higher firing rates. Third: Each hippocampal cell is hypothetically coupled with a reward function that provides the expected reward in each state, here shown as a ramp of dopamine release peaking at the reward location. This coupling could occur via dopaminergic innervation of the HPC, or via spike coupling of HPC cells with nucleus accumbens (NAc) neurons, for example, which receive ramping dopamine. Fourth: The SR and reward are multiplied to estimate the value function for each state (combined colours). d ∣ In this simplified hypothesis, dopaminergic and other neuromodulatory systems convey reward prediction information to the HPC–entorhinal cortex (EC) system, which helps assign these values to discrete states that compose an experience. ‘States’ here are synonymous with spatial representations of the HPC-EC. State representations are sent to downstream areas (yellow), which layer additional information onto these states, such as task requirements and sensory features. No reciprocal arrow is shown for the basal ganglia because there is no known direct return projection, but the basal ganglia (including the NAc) help to use state values for action invigoration. The HPC–EC, frontal cortices and basal ganglia each project back to the dopaminergic system directly or indirectly, putatively providing updates about predicted outcomes and value changes to individual states. Interactions in this network contribute to memory storage, decision-making and action generation.

Similar articles

Cited by

References

    1. Robinson TE & Berridge KC The psychology and neurobiology of addiction: an incentive-sensitization view. Addiction 95 Suppl 2, S91–117, (2000). - PubMed
    1. Crombag HS & Shaham Y Renewal of drug seeking by contextual cues after prolonged extinction in rats. Behavioral neuroscience 116, 169–173, (2002). - PubMed
    1. O'Keefe J & Nadel L The hippocampus as a cognitive map. (Oxford University Press, 1978).
    1. Hafting T, Fyhn M, Molden S, Moser MB & Moser EI Microstructure of a spatial map in the entorhinal cortex. Nature 436, 801–806, (2005). - PubMed
    1. McNaughton BL, Battaglia FP, Jensen O, Moser EI & Moser MB Path integration and the neural basis of the 'cognitive map'. Nat Rev Neurosci 7, 663–678, (2006). - PubMed

Publication types