Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 25;14(10):e1006518.
doi: 10.1371/journal.pcbi.1006518. eCollection 2018 Oct.

Modeling sensory-motor decisions in natural behavior

Affiliations

Modeling sensory-motor decisions in natural behavior

Ruohan Zhang et al. PLoS Comput Biol. .

Abstract

Although a standard reinforcement learning model can capture many aspects of reward-seeking behaviors, it may not be practical for modeling human natural behaviors because of the richness of dynamic environments and limitations in cognitive resources. We propose a modular reinforcement learning model that addresses these factors. Based on this model, a modular inverse reinforcement learning algorithm is developed to estimate both the rewards and discount factors from human behavioral data, which allows predictions of human navigation behaviors in virtual reality with high accuracy across different subjects and with different tasks. Complex human navigation trajectories in novel environments can be reproduced by an artificial agent that is based on the modular model. This model provides a strategy for estimating the subjective value of actions and how they influence sensory-motor decisions in natural behavior.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The virtual-reality human navigation experiment with motion tracking.
(A) A human subject wears a head mounted display (HMD) and trackers for eyes, head, and body. (B) The virtual environment as seen through the HMD. The red cubes are obstacles and the blue spheres are targets. There is also a gray path on the ground leading to a goal (the green disk). At the green disk the subject is ‘transported’ to a new ‘level’ in a virtual elevator for another trial with a different arrangement of objects.
Fig 2
Fig 2. The concept of modular reinforcement learning illustrated using value surfaces.
(A) The value surface is flat without any reward signal. (B) A module object with positive reward has positive weight, and one with negative reward has negative weight. They bend the value surface to have negative and positive curvatures respectively. Therefore, an agent desires to follow the steepest descent to minimize energy, or equivalently, to maximize reward. (C) An object with larger weight bends the surface more. (D) An object with greater discount factor γ has larger influence over distance. (E,F) Composing different objects with different rewards and γs results complicated value surfaces that can model an agent’s value function over the entire state space.
Fig 3
Fig 3. Maximum likelihood modular inverse reinforcement learning.
(A) From an observed trajectory (a sequence of state-action pairs), the goal of modular IRL is to recover the underlying value surface. (B) Maximum likelihood IRL assumes that the probability of observing a particular action (red) in a state is proportional to its Q-value among all possible actions as in Eq (5).
Fig 4
Fig 4. Bird’s-eye view of human trajectories and agent trajectory clouds across different subjects.
Black lines: human trajectories. Green lines: modular RL agent trajectory clouds generated using softmax action selection. The green is semi-transparent hence darker area represents trajectories with higher likelihood. Yellow circles: end of the path. Blue circles: targets. Red squares: obstacles. Gray dots: path waypoints used by the model (subjects see a continuous path). Below each graph are the rewards and discount factors estimated from human and used by the modular RL agent. The rewards and discount factors are shown in the order of (Target, Obstacle, Path). The module rewards that correspond to task instructions are bold. Obstacle module has negative reward, but to compare with the other two modules the absolute value is taken. Three trials within each row are from different subjects but the same environment. (A,B,C) show trials from Task 1: follow the path. (D,E,F) show trials from Task 2: follow the path and avoid obstacles. (G,H,I) show trials from Task 3: follow the path and collect targets. (J,K,L) show trials from Task 4: follow the path, collect targets, and avoid obstacles.
Fig 5
Fig 5
(A) Normalized average rewards across different task instructions. The error bar represents the standard error of the mean between subjects (N = 25). The obstacle module has negative reward, but to compare with the other two modules its absolute value is taken. The estimated reward agree with task instructions. (B) Average discount factors across different task instructions. The error bar represents the standard error of the mean between subjects (N = 25).
Fig 6
Fig 6. Average number of targets collected/obstacles hit when different models perform the navigation task across all trials.
There are 12 targets/obstacles each in the virtual room. Error bars indicate standard error of the mean (N = 100).
Fig 7
Fig 7. Modular reinforcement learning (left) vs. hierarchical reinforcement learning (right).
Modular RL assumes modules run concurrently and do not extend over multiple time steps. Hierarchical RL assumes that a single option may extend over multiple time steps.

Similar articles

Cited by

References

    1. Hayhoe MM. Vision and action. Annual review of vision science. 2017;3:389–413. 10.1146/annurev-vision-102016-061437 - DOI - PubMed
    1. Sprague N, Ballard D, Robinson A. Modeling embodied visual behaviors. ACM Transactions on Applied Perception (TAP). 2007;4(2):11 10.1145/1265957.1265960 - DOI
    1. Rothkopf CA, Ballard DH, Hayhoe MM. Task and context determine where you look. Journal of vision. 2007;7(14):16–16. 10.1167/7.14.16 - DOI - PubMed
    1. Tong MH, Zohar O, Hayhoe MM. Control of gaze while walking: task structure, reward, and uncertainty. Journal of Vision. 2017;. 10.1167/17.1.28 - DOI - PMC - PubMed
    1. Sutton RS, Barto AG. Introduction to reinforcement learning. MIT Press; 1998.

Publication types