Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun:201:101996.
doi: 10.1016/j.pneurobio.2021.101996. Epub 2021 Jan 14.

Supporting generalization in non-human primate behavior by tapping into structural knowledge: Examples from sensorimotor mappings, inference, and decision-making

Affiliations

Supporting generalization in non-human primate behavior by tapping into structural knowledge: Examples from sensorimotor mappings, inference, and decision-making

Jean-Paul Noel et al. Prog Neurobiol. 2021 Jun.

Abstract

The complex behaviors we ultimately wish to understand are far from those currently used in systems neuroscience laboratories. A salient difference are the closed loops between action and perception prominently present in natural but not laboratory behaviors. The framework of reinforcement learning and control naturally wades across action and perception, and thus is poised to inform the neurosciences of tomorrow, not only from a data analyses and modeling framework, but also in guiding experimental design. We argue that this theoretical framework emphasizes active sensing, dynamical planning, and the leveraging of structural regularities as key operations for intelligent behavior within uncertain, time-varying environments. Similarly, we argue that we may study natural task strategies and their neural circuits without over-training animals when the tasks we use tap into our animal's structural knowledge. As proof-of-principle, we teach animals to navigate through a virtual environment - i.e., explore a well-defined and repetitive structure governed by the laws of physics - using a joystick. Once these animals have learned to 'drive', without further training they naturally (i) show zero- or one-shot learning of novel sensorimotor contingencies, (ii) infer the evolving path of dynamically changing latent variables, and (iii) make decisions consistent with maximizing reward rate. Such task designs allow for the study of flexible and generalizable, yet controlled, behaviors. In turn, they allow for the exploitation of pillars of intelligence - flexibility, prediction, and generalization -, properties whose neural underpinning have remained elusive.

Keywords: Cognitive map; Flexibility; Generalization; Learning set; Natural behavior; Reinforcement learning.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Conceptual Framework. (A) In standard two alternative-forced choice protocols, animals are asked to fixate, and are then presented with sensory stimuli they cannot control or modify. Next, at a different time, they are allowed to make a decision by motor output within an impoverished state space (e.g., only two choices available). (B). In the real world, we must first decide whether to engage in a particular task or not (e.g., catching a firefly), and then, we must operate within a sensory-motor loop, where actions may change the landscape of sensory evidence, and this evidence updates our internal models, which in turn may or may not change our actions.
Figure 2:
Figure 2:
Path Integration Toward an Unseen Target in Virtual Reality - Teaching Non-Human Primates How to Report. (A) The animal has linear and angular velocity control via a joystick, allowing it to navigate and explore a two-dimensional space made up of a target (the firefly - black disk) and triangular elements that are presented only briefly and thus serving as optic flow. (B) Example trial where the monkey navigates straight to the target and is rewarded if it stops within a reward zone (reward zone is never explicitly shown to the animals). (C) A collection of example trajectories.
Figure 3:
Figure 3:
Gain Manipulation. (A) Animals were trained with a unique sensorimotor mapping (Trained-Gain, 1x) and then experienced gains that changed approximately every 50 trials and tiled the space from 1x (trained, red) to 2x (black). We compared the true error subjects made (lower panel, black, difference from r to r˜) vs. the error they would have made if using the gain they were accustomed to (lower panel, red, difference from r to rtg˜). (B and C) An example monkey (Monkey J) was able to quickly adjust its sensorimotor mappings, as shown by the fact that during the entire session the real gain (C, black) was smaller than what would have been predicted from the trained gain (C, red). The lack of covariance between gain changes and error suggests that the animal’s performance was not driven by changes in gain. The vertical shaded areas are time-periods of gain = 1x. (D) Average real radial error (black) and predicted if there were no sensorimotor adaptation (red) for all monkeys. (E) Examining the radial error on each trial as a function of trial number since gain change suggested that two monkeys (J and M) showed zero-shot learning, while another two (S and V) showed 1-shot learning. Y-axis is cm error per radial distance of targets (in cm; normalization is required given that error scales with distance and the distance of targets on first, second, etc., trial after gain change varied). Errors bars are +/− 1 S.E.M., red is the error predicted if animals would have used the gain they were trained with (1x), light blue is the average error on the first trial of a particular gain manipulation, dark blue is the second trial, and black are the rest (3–20).
Figure 4:
Figure 4:
Moving firefly. (A) Location of firefly targets at the beginning (green) and end (black) of trials. (B) An example trajectory. The firefly moves rightward for 300 ms, and then disappears (green to red filled circles). In this time, the monkey moves forward without adjusting its lateral movement (green to red empty circles). Then, the firefly keeps moving, and while not seen, the monkey moves rightward (red to black empty circles) and stops within the reward boundary. (C). Lateral response (in cm) as a function of the lateral position of the target. Trials are re-coded such that if the monkey had navigated to the lateral position of the target at trial onset, its responses would lie along y = 0 (green). If the monkey had perfectly gone to the end position of the target, it’s responses would like along y = x (black, “end model”). If the monkey navigates to the closest reward boundary edges (which depends on the direction of motion of the firefly), its responses would lay along the blue curve. (D) All monkeys inferred that the firefly kept moving after disappearing and followed the reward boundary. Error bars are +/− 1 S.E.M. (weights are not normalized to 1 here and can take either positive or negative values). (E) Normalized weight (sum to 1, in order to compare relative weighting of the different models) of the start (green), end (black) and reward boundary (blue) models as determined by multiple regression within a moving window of 20 trials (x-axis start at trial 10 indicating the center of the moving window). Already in the first window examined Monkeys J and V are following the moving firefly. After ~20 trials all animals are. Inset show the entire session with weights smoothed with a 100 trials kernel. Multiple regression analyses within a window is necessary (vs. single trial) as when targets are near, they do not move much laterally, rendering impossible the distinction among different models.
Figure 5:
Figure 5:
Binary decision-making via transfer learning in a naturalistic task (A) Depiction of the task. Two targets were displayed transiently and simultaneously to the animals. They were free to choose which one to catch. (B) Example of trajectories seen from top. The targets were drawn from independent distributions spanning the gray field in front of the animals. Black disks represent the positions at which the target appeared. The black line is the trajectory the monkeys followed, starting at the bottom. The last field depicts 2 variables the monkeys use to choose which target to catch: deviation from straight-ahead (difference in absolute angles) and relative distance (difference in radial distances). (C) Ratio of trials where the monkeys chose a specific target (target 1 vs. target 2) as a function of the difference between the absolute angle of the targets (left column) and difference between the radial distance of the targets (right column) for 3 monkeys (rows). All animals demonstrated a clear preference for closer and more straight-ahead targets. (D) Running average of the reward rate associated with optimal choice (green), animal’s choice (black) and random choice (red) as a function of trial number from introduction to the task.
Figure 6:
Figure 6:
Naturalistic Foraging within a Multi-Firefly Scenario. (A) Two-hundred fireflies were present within a large virtual environment, and flashed at random times (red = firefly on, black = firefly off, blue and inset show the horizon of what was visible during the example frame, yellow trajectory shows movement over the last second). (B) Monkey stopping location referenced to the nearest firefly. (C) Observed (red) and null distribution (black, shuffled) of total rewards within the session. (D) Rewards per minute for the three different monkeys (B, Q, S) within the first session they were exposed to the environment with hundreds of fireflies.

Similar articles

Cited by

References

    1. Ames KC, Ryu SI, Shenoy KV. (2019). Simultaneous motor preparation and execution in a last-moment reach correction task. Nature Communications 10:2718. - PMC - PubMed
    1. Bala PC, Eisenreich BR, Yoo SBM, Hayden BY, Park HS, & Zimmermann J (2020). OpenMonkeyStudio: automated markerless pose estimation in freely moving macaques. Nature Communications, 11, 4560 - PMC - PubMed
    1. Balzani E, Lakshminarasimhan K, Angelaki D, Savin C (2020). Efficient estimation of neural tuning during naturalistic behavior. NeurIPS
    1. Banino A, Barry C, Uria B, Blundell C, Lillicrap T, Mirowski P, Pritzel A, Chadwick MJ, Degris T, Modayil J, et al. (2018). Vector-based navigation using grid-like representations in artificial agents. Nature 557, 429–433. - PubMed
    1. Bao PL, She L, Mcgill M, & Tsao DY (2020). A map of object space in primate inferotemporal cortex. Nature. 10.1038/s41586-020-2350-5 - DOI - PMC - PubMed

Publication types