Statistical mechanics of reward-modulated learning in decision-making networks

Kentaro Katahira; Kazuo Okanoya; Masato Okada

doi:10.1162/NECO_a_00264

Statistical mechanics of reward-modulated learning in decision-making networks

Neural Comput. 2012 May;24(5):1230-70. doi: 10.1162/NECO_a_00264. Epub 2012 Feb 1.

Authors

Kentaro Katahira¹, Kazuo Okanoya, Masato Okada

Affiliation

¹ Japan Science Technology Agency, ERATO, Okanoya Emotional Information Project, 351-0198 Saitama, Japan. katahira@mns.k.u-tokyo.ac.jp

PMID: 22295982
DOI: 10.1162/NECO_a_00264

Abstract

The neural substrates of decision making have been intensively studied using experimental and computational approaches. Alternative-choice tasks accompanying reinforcement have often been employed in investigations into decision making. Choice behavior has been empirically found in many experiments to follow Herrnstein's matching law. A number of theoretical studies have been done on explaining the mechanisms responsible for matching behavior. Various learning rules have been proved in these studies to achieve matching behavior as a steady state of learning processes. The models in the studies have consisted of a few parameters. However, a large number of neurons and synapses are expected to participate in decision making in the brain. We investigated learning behavior in simple but large-scale decision-making networks. We considered the covariance learning rule, which has been demonstrated to achieve matching behavior as a steady state (Loewenstein & Seung, 2006 ). We analyzed model behavior in a thermodynamic limit where the number of plastic synapses went to infinity. By means of techniques of the statistical mechanics, we can derive deterministic differential equations in this limit for the order parameters, which allow an exact calculation of the evolution of choice behavior. As a result, we found that matching behavior cannot be a steady state of learning when the fluctuations in input from individual sensory neurons are so large that they affect the net input to value-encoding neurons. This situation naturally arises when the synaptic strength is sufficiently strong and the excitatory input and the inhibitory input to the value-encoding neurons are balanced. The deviation from matching behavior is caused by increasing variance in the input potential due to the diffusion of synaptic efficacies. This effect causes an undermatching phenomenon, which has been often observed in behavioral experiments.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Brain / physiology
Choice Behavior / physiology
Computer Simulation
Decision Making / physiology*
Learning / physiology*
Models, Neurological
Models, Statistical*
Neuronal Plasticity / physiology
Neurons / physiology*
Reinforcement, Psychology
Reward
Synapses / physiology
Synaptic Transmission / physiology*