Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
, 236 (8), 2373-2388

Impacts of Inter-Trial Interval Duration on a Computational Model of Sign-Tracking vs. Goal-Tracking Behaviour

Affiliations
Comparative Study

Impacts of Inter-Trial Interval Duration on a Computational Model of Sign-Tracking vs. Goal-Tracking Behaviour

François Cinotti et al. Psychopharmacology (Berl).

Abstract

In the context of Pavlovian conditioning, two types of behaviour may emerge within the population (Flagel et al. Nature, 469(7328): 53-57, 2011). Animals may choose to engage either with the conditioned stimulus (CS), a behaviour known as sign-tracking (ST) which is sensitive to dopamine inhibition for its acquisition, or with the food cup in which the reward or unconditioned stimulus (US) will eventually be delivered, a behaviour known as goal-tracking (GT) which is dependent on dopamine for its expression only. Previous work by Lesaint et al. (PLoS Comput Biol, 10(2), 2014) offered a computational explanation for these phenomena and led to the prediction that varying the duration of the inter-trial interval (ITI) would change the relative ST-GT proportion in the population as well as phasic dopamine responses. A recent study verified this prediction, but also found a rich variance of ST and GT behaviours within the trial which goes beyond the original computational model. In this paper, we provide a computational perspective on these novel results.

Keywords: Goal-tracking; Reinforcement learning; Sign-tracking.

Conflict of interest statement

The authors declare that there is no conflict of interest.

Figures

Fig. 1
Fig. 1
a MDP representation of a single trial from the original experiment by Flagel et al. (2009) adapted from Lesaint et al. (2014). There are six possible actions leading deterministically from one state to the next: exploring the environment (goE), approaching the lever (goL), approaching the magazine (goM), waiting, engaging with the closest stimulus, and eating the reward. Each of these actions focuses on a specific feature indicated in brackets: the environment (E), the lever (L), the magazine (M), and the food (F). These are the features used by the FMF learning component. The red path corresponds to sign-tracking behaviour and the blue path to goal-tracking behaviour. b Corresponding timeline of lever and food appearances
Fig. 2
Fig. 2
Schematic representation of the FMF-MB decision-making model adapted from Lesaint et al. (2014). The model combines a Model-Based learning system which learns the structure of the MDP and then calculates the relative advantage of each action in a given state, with a Feature-Model-Free system which attributes a value to different features of the environment which is generalized across states (e.g. the same value of the magazine is used in states 1 and 4). The advantage function and value function are weighted by ω, their relative importance determining the sign- vs goal-tracking tendency of the individual and then passed to the action selection mechanism modelled by a softmax function
Fig. 3
Fig. 3
Behaviour of FMF-only (ω = 1, ad) and MB-only (ω = 0, eh) models. For each graph, we have plotted the mean ± s.e.m. a Approach to lever of simulations of the FMF-only model for different ITI durations. b Approach to the food cup of simulations of the FMF-only model for different ITI durations. c Effect of down-revision of food cup value on FMF and MB values of the FMF-only model. d Average softmax probabilities of engaging with either the lever or the food cup during the CS period of the FMF-only model for different ITI durations. e Approach to lever of simulations of the MB-only model for different ITI durations. f Approach to the food cup of the MB-only model for different ITI durations. g Effect of down-revision of food cup value on FMF and MB values of the MB-only model. h Average softmax probabilities of engaging with either the lever or the food cup during the CS period of the MB-only model for different ITI durations
Fig. 4
Fig. 4
Simulations of the behaviour of a population with random ω parameter values. a Distribution of the ω parameters sampled from a β distribution which were then used for the simulations. The same values of ω were used in both short and long ITI condition. Inset: probability density function of the original distribution is biased towards 1 in accordance with the reported prevalence of sign-trackers. b Approach to the lever for different ITI durations (mean ± s.e.m.). c Approach to the food cup for different ITI durations (mean ± s.e.m.). d Distribution of differences in softmax probability of approach to lever and magazine for the two ITI durations. There is a significant bias towards goal-tracking choices in the short ITI group and a significant bias towards sign-tracking choices in the long ITI group. e Distribution of differences in average simulated number of approaches to lever and magazine for the two ITI durations. As expected from the differences in softmax probabilities, there is a significantly higher number of goal-tracking than sign-tracking trials in the short ITI condition and vice-versa in the long ITI condition. f Top: Effect of down-revision of food cup value during ITI of different durations on average FMF-values and MB action advantages. Bottom: Average softmax probabilities of engaging with either the lever of the food cup during the CS period for different ITI durations
Fig. 5
Fig. 5
Reward prediction errors of the model at CS and US presentation for short and long ITIs a Reward prediction errors averaged across all sessions b Reward prediction errors for the long ITI simulations averaged in early and late sessions. c Reward prediction errors for the short ITI simulations averaged in early and late sessions
Fig. 6
Fig. 6
Simulations of the original model with added intermediate steps within the CS presentation period. a New task structure with added intermediate steps and possible transitions from the lever to the food cup and vice-versa in states 2 and 3. In addition, the possibility of exploring the environment was deleted for simplicity. b Probability of approach to the food cup during the first and last four seconds of the CS presentation period. Bar plot represents mean probability and grey lines individual probabilities. c Average feature values of the lever and magazine in the short and long ITI conditions across sessions. In the long ITI group, the value of the less favourable feature, which is the food cup, is stagnant, while in the short ITI the value of the lever keeps increasing, causing possible ambiguity which could explain unstable behaviour during the CS period

Similar articles

See all similar articles

Cited by 1 article

References

    1. Ahrens AM, Singer BF, Fitzpatrick CJ, Morrow JD, Robinson TE. Rats that sign-track are resistant to Pavlovian but not instrumental extinction. Behav Brain Res. 2016;296:418–430. doi: 10.1016/j.bbr.2015.07.055. - DOI - PMC - PubMed
    1. Anselme P. Incentive salience attribution under reward uncertainty: a Pavlovian model. Behav Process. 2015;111:6–18. doi: 10.1016/j.beproc.2014.10.016. - DOI - PubMed
    1. Berridge KC. From prediction error to incentive salience: mesolimbic computation of reward motivation. Eur J Neurosci. 2012;35(7):1124–1143. doi: 10.1111/j.1460-9568.2012.07990.x. - DOI - PMC - PubMed
    1. Cinotti F, Fresno V, Aklil N, Coutureau E, Girard B, Marchand AR, Khamassi M (2019) Dopamine blockade impairs the exploration-exploitation trade-off in rats. Scientific Reports, 9(1) - PMC - PubMed
    1. Davey GCL, Oakley D, Cleland GG. Autoshaping in the rat: effect of omission on the form of the response. J Exp Anal Behav. 1981;36(1):75–91. doi: 10.1901/jeab.1981.36-75. - DOI - PMC - PubMed

Publication types

Feedback