What biological mechanisms underlie the reward-predictive firing properties of midbrain dopaminergic neurons, and how do they relate to the complex constellation of empirical findings understood as Pavlovian and instrumental conditioning? We previously presented PVLV, a biologically inspired Pavlovian learning algorithm accounting for DA activity in terms of two interrelated systems: a primary value (PV) system, which governs how DA cells respond to a US (reward) and; a learned value (LV) system, which governs how DA cells respond to a CS. Here, we provide a more extensive review of the biological mechanisms supporting phasic DA firing and their relation to the spate of Pavlovian conditioning phenomena and their sensitivity to focal brain lesions. We further extend the model by incorporating a new NV (novelty value) component reflecting the ability of novel stimuli to trigger phasic DA firing, providing "novelty bonuses" which encourages exploratory working memory updating and in turn speeds learning in trace conditioning and other working memory-dependent paradigms. The evolving PVLV model builds upon insights developed in many earlier computational models, especially reinforcement learning models based on the ideas of Sutton and Barto, biological models, and the psychological model developed by Savastano and Miller. The PVLV framework synthesizes these various approaches, overcoming important shortcomings of each by providing a coherent and specific mapping to much of the relevant empirical data at both the micro- and macro-levels, and examines their relevance for higher order cognitive functions.
Copyright 2009 Elsevier Ltd. All rights reserved.