Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 12;12(1):6567.
doi: 10.1038/s41467-021-26784-w.

Entropy-based metrics for predicting choice behavior based on local response to reward

Affiliations

Entropy-based metrics for predicting choice behavior based on local response to reward

Ethan Trepka et al. Nat Commun. .

Abstract

For decades, behavioral scientists have used the matching law to quantify how animals distribute their choices between multiple options in response to reinforcement they receive. More recently, many reinforcement learning (RL) models have been developed to explain choice by integrating reward feedback over time. Despite reasonable success of RL models in capturing choice on a trial-by-trial basis, these models cannot capture variability in matching behavior. To address this, we developed metrics based on information theory and applied them to choice data from dynamic learning tasks in mice and monkeys. We found that a single entropy-based metric can explain 50% and 41% of variance in matching in mice and monkeys, respectively. We then used limitations of existing RL models in capturing entropy-based metrics to construct more accurate models of choice. Together, our entropy-based metrics provide a model-free tool to predict adaptive choice behavior and reveal underlying neural mechanisms.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Schematic of the experimental paradigms in mice and monkeys and basic behavioral results.
a, b Timeline of a single trial during experiments in mice (a) and monkeys (b). To initiate a trial, mice received an olfactory go cue (or no-go cue in 5% of trials) (a), and monkeys fixated on a central point (b). Next, animals chose (via licks for mice and saccades for monkeys) between two options (left or right tubes for mice and circle or square for monkeys) and then received a reward (drop of water and juice for mice and monkeys, respectively) probabilistically based on their choice. c, d Average choice and reward using a sliding window with a length of 10 for a representative session in mice (c) and five superblocks of a representative session in monkeys (d). Mean selection of 1 and −1 correspond to 100% selection of or 100% reward on the right and left in mice (square and circle stimuli in monkeys), respectively, and mean selection of 0 corresponds to equal selection or reward on the two choice options. Vertical gray dashed lines indicate trials where reward probabilities reversed. Vertical gray solid lines indicate divisions between superblocks in the monkey experiment. e, f Average relative choice and reward fractions around block switches using a non-causal smoothing kernel with a length of three separately for all blocks with a given reward schedule in mice (e) and monkeys (f). The better (or worse) option is the better (or worse) option prior to the block switch. Trial zero is the first trial with the reversed reward probabilities. Average choice fractions for the better option (better side or stimulus) are lower than average reward fractions for that option throughout the block for both mice and monkeys, corresponding to undermatching behavior.
Fig. 2
Fig. 2. Mice and monkeys exhibit highly variable but significant undermatching.
a Plot shows relative reward fraction versus relative choice fraction across all blocks separately for each reward schedule in mice. The black dashed line represents the identity line. For relative reward fractions >0.5, nearly all points remain below the identity line, indicating a relative choice fraction smaller than reward fraction for the better option (undermatching). Similarly, points above the identity line for relative reward fractions <0.5 indicate undermatching. b, c Histograms show deviation from matching for the 40/10 (b) and 40/5 reward schedules (c) in mice. The solid black line indicates 0, corresponding to perfect matching. The dashed black lines are the median deviation from matching. Asterisks indicate significance based on a two-sided Wilcoxon signed-rank test (40/10:p=1.53×10213;40/5:p=8.75×10269). df Similar to (ac), but for monkeys with 70/30 and 80/20 reward schedules. Asterisks indicate significance based on a two-sided Wilcoxon signed-rank test (70/30:p=9.74×10161;80/20:p=3.06×10161). Because of random fluctuations in local reward probabilities, overmatching occurred in a minority of cases.
Fig. 3
Fig. 3. Relationship between entropy-based metrics and win-stay, lose-switch strategies.
ac Plotted are ERDS and ERDS decompositions for rewarded and unrewarded trials (ERDS+ and ERDS−) as a function of p(win), p(lose), win-stay, and lose-switch. Darker colors correspond to larger values of metrics. For the plot in (a), p(win) is set to 0.5. Observed entropy-based metrics and constituent probabilities for each block for mice (orange dots) and monkeys (green dots) are superimposed on surfaces. df EODS and EODS decompositions for the better and worse options (EODSB and EODSW) as a function of the probabilities of choosing the better and worse options, p(better) and p(worse), conditional probability of stay on the better option, and conditional probability of switch from the worse option. For the plot in (d), p(better) is set to 0.5. For all plots, the units of entropy-based metrics are bits. gi Same as in (ac) but using heatmap. jl Same as in (df) but using heatmap.
Fig. 4
Fig. 4. Correlation between undermatching and proposed entropy-based metrics and underlying probabilities.
a Pearson correlation between proposed entropy-based metrics and existing behavioral metrics and deviation from matching in mice. Correlation coefficients are computed across all blocks, and metrics with nonsignificant correlations (two-sided, p>0.0001 to account for multiple comparisons) are indicated with a hollow bar. The metric with the highest correlation with deviation from matching is indicated with a star (ERODSW−; r=0.71,p<10300). b Similar to (a) but for monkeys (ERODSW−; r=0.64,p=10231). Overall, entropy-based metrics show stronger correlation with deviation from matching than existing metrics.
Fig. 5
Fig. 5. RL2 + CM + LM and RL2 + CM models better account for choice behavior, undermatching, and entropy-based metrics in mice and monkeys, respectively.
a, b Comparison of goodness-of-fit of a return-based (RL1) model, income-based (RL2) model, income-based models augmented with choice memory (CM) and/or loss-memory (LM) components, and a model based on learning on multiple timescales. Plotted is the Akaike Information Criterion (AIC) averaged over all sessions and Akaike weights computed with the average AIC for mice (a) and monkeys (b). cf Empirical cumulative distribution functions of ERODSW− (c, d) and deviation from matching (e, f) observed in animals and predicted from simulations of the RL2 and RL2 + CM + LM models. Shaded bars around CDFs indicate 95% confidence interval. Dashed vertical lines indicate the median of each distribution. Insets display the distribution of observed metrics versus metrics predicted using the RL2 model (left inset) and RL2 + CM + LM model and RL2 + CM model (right inset) for mice and monkeys, respectively. Displayed D-values and p values are the test-statistic and p value from a two-sided Kolmogorov–Smirnov test comparing the distributions. The RL2 + CM + LM model and RL2 + CM better captured deviation from matching by over 20% and over 30% in mice and monkeys, respectively.

Similar articles

Cited by

References

    1. Herrnstein RJ. Relative and absolute strength of response as a function of frequency of reinforcement. J. Exp. Anal. Behav. 1961;4:267–272. - PMC - PubMed
    1. Williams, B. A. Reinforcement, choice, and response strength. in Stevens’ handbook of experimental psychology vol. 2 167–244 (John Wiley & Sons, 1988).
    1. de Villiers PA, Herrnstein RJ. Toward a law of response strength. Psychol. Bull. 1976;83:1131–1153.
    1. William BM. Matching, undermatching, and overmatching in studies of choice. J. Exp. Anal. Behav. 1979;32:269–281. - PMC - PubMed
    1. Mazur JE. Optimization theory fails to predict performance of pigeons in a two-response situation. Science. 1981;214:823–825. - PubMed

Publication types