The strategies found by animals facing a new task are determined both by individual experience and by structural priors evolved to leverage the statistics of natural environments. Rats quickly learn to capitalize on the trial sequence correlations of two-alternative forced choice (2AFC) tasks after correct trials but consistently deviate from optimal behavior after error trials. To understand this outcome-dependent gating, we first show that recurrent neural networks (RNNs) trained in the same 2AFC task outperform rats as they can readily learn to use across-trial information both after correct and error trials. We hypothesize that, although RNNs can optimize their behavior in the 2AFC task without any a priori restrictions, rats' strategy is constrained by a structural prior adapted to a natural environment in which rewarded and non-rewarded actions provide largely asymmetric information. When pre-training RNNs in a more ecological task with more than two possible choices, networks develop a strategy by which they gate off the across-trial evidence after errors, mimicking rats' behavior. Population analyses show that the pre-trained networks form an accurate representation of the sequence statistics independently of the outcome in the previous trial. After error trials, gating is implemented by a change in the network dynamics that temporarily decouple the categorization of the stimulus from the across-trial accumulated evidence. Our results suggest that the rats' suboptimal behavior reflects the influence of a structural prior that reacts to errors by isolating the network decision dynamics from the context, ultimately constraining the performance in a 2AFC laboratory task.
Copyright © 2022 Elsevier Inc. All rights reserved.