Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jun;49(10):1295-306.
doi: 10.1016/j.visres.2008.09.007. Epub 2008 Oct 19.

Bayesian surprise attracts human attention

Affiliations

Bayesian surprise attracts human attention

Laurent Itti et al. Vision Res. 2009 Jun.

Abstract

We propose a formal Bayesian definition of surprise to capture subjective aspects of sensory information. Surprise measures how data affects an observer, in terms of differences between posterior and prior beliefs about the world. Only data observations which substantially affect the observer's beliefs yield surprise, irrespectively of how rare or informative in Shannon's sense these observations are. We test the framework by quantifying the extent to which humans may orient attention and gaze towards surprising events or items while watching television. To this end, we implement a simple computational model where a low-level, sensory form of surprise is computed by simple simulated early visual neurons. Bayesian surprise is a strong attractor of human attention, with 72% of all gaze shifts directed towards locations more surprising than the average, a figure rising to 84% when focusing the analysis onto regions simultaneously selected by all observers. The proposed theory of surprise is applicable across different spatio-temporal scales, modalities, and levels of abstraction.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Simple description of how surprise may be computed at a high level of abstraction, for an observer who has beliefs about possible television channels that she or he may be watching. Section 3 for further details.
Fig. 2
Fig. 2
Simple example of surprise computation for series of coin tosses. Here the prior and posterior distributions of beliefs about how fair the coin may be are formalized as Beta distributions.
Fig. 3
Fig. 3
Hypothetical implementation of surprise computation in a single neuron. (a) Prior data observations, tuning preferences, and top-down influences contribute to shaping a set of “prior beliefs” a neuron may have over a class of internal models or hypotheses about the world. For instance, 𝓜 may be a set of Poisson processes parameterized by the rate λ, with {P(M)}M∈𝓜 = {P(λ)}λIR+* the prior distribution of beliefs about which Poisson models well describe the world as sensed by the neuron. New data D updates the prior into the posterior using Bayes’ theorem. Surprise quantifies the difference between the posterior and prior distributions over the model class 𝓜. The remaining panels detail how surprise differs from conventional model fitting and outlier-based novelty. (b) In standard iterative Bayesian model fitting, at every iteration N, incoming data DN is used to update the prior {P(M|D1, D2, …, DN−1)}M∈𝓜 into the posterior {P(M|D1, D2, …, DN)}M∈𝓜. Freezing this learning at a given iteration, one then picks the currently best model, usually using either a maximum likelihood criterion, or a maximum a posteriori one (yielding MMAP shown). (c) This best model is used for a number of tasks at the current iteration, including outlier-based novelty detection. New data is then considered novel at that instant if it has low likelihood for the best model (e.g., DNb is more novel than DNa). This focus onto the single best model presents obvious limitations, especially in situations where other models are nearly as good (e.g., M* in panel (b) is entirely ignored during standard novelty computation). One palliative solution is to consider mixture models, but this just amounts to shifting the problem into a different model class. (d) Surprise directly addresses this problem by simultaneously considering all models and by measuring how data changes the observer’s distribution of beliefs from {P(M|D1, D2, …, DN−1)}M∈𝓜 to {P(M|D1, D2, …, DN)}M∈𝓜 over the entire model class 𝓜 (orange shaded area).
Fig. 4
Fig. 4
(a) Sample eye movement traces from four observers (CZ, NM, RC, VN) watching one video clip (545 frames, 18.1 s) that showed cars passing by on a fairly static background. Squares denote saccade endpoints (42, 36, 48, and 16 saccades for CZ, NM, RC, and VN). (b) Our data shows high inter-individual overlap of saccade targets, as shown here with the locations where one human saccade endpoint was nearby (within 5.6°) the instantaneous eye position of one (white squares, 47 saccades), two (cyan squares, 36 saccades) or all three (black squares, 13 saccades) other humans. (c) Given this high overlap, a metric where the master map was created from the three eye movement traces other than that being tested yielded an upper-bound KL score, computed by comparing the histograms of metric values at human (blue) and random (green) saccade targets. Indeed, this metric’s map was very sparse, as demonstrated by the high number of random saccades landing on locations with near-zero metric response. Yet humans preferentially saccaded towards the three active hotspots corresponding to the instantaneous eye positions of three other humans, as demonstrated by the high number of human saccades landing on locations with near-unity metric responses.
Fig. 5
Fig. 5
(a) Sample frames from our video clips, with corresponding human saccades and predictions from the entropy, surprise, and human-derived metrics. Entropy maps, like variance and DCT-based information maps, exhibited many locations with high responses, hence had low specificity and were poorly discriminative. In contrast, surprise and human-derived maps were much sparser and more specific. For three example frames (first column), saccades from one subject are shown (arrows) with corresponding apertures over which master map activity was sampled (circles). Associated master maps exemplify the varying degrees of sparseness and specificity of the metrics tested. (b) KL scores quantify the tendency of human saccades (narrow blue bars) to pick hotspots with high values in the master maps, compared to chance (wide green bars, which reflect the intrinsic distributions of hotspots for each metric). A KL score of zero would indicate that humans did not look at hotspots in a master map more often than expected solely by chance. For all metrics studied, KL scores were significantly above zero, and reflected significantly different performance levels, with a strict ranking of variance < orientation < entropy < motion < saliency < surprise < human-derived (also see Table 1). Among eleven computational metrics tested in total, surprise performed best, in that surprising locations were relatively few yet reliably gazed to by humans.
Fig. 6
Fig. 6
KL scores when considering only saccades where at least one (all 10,192 saccades), two (7948 saccades), three (5565 saccades), or all four (2951 saccades) humans agreed on a general area of interest in the video clips (their gazes were within 5.6° of each other), for all eleven computational metrics. Scores of static metrics (bottom) improved substantially when progressively focusing onto only saccades with stronger inter-observer agreement (average slope 0.56 ± 0.37 percent KL score units per 1000 pruned saccade). Hence, when humans agreed on an important location, they also tended to be more reliably predicted by the computational metrics. Furthermore, all dynamic metrics (top) improved nearly 4.25 times more steeply (slope 2.37 ± 0.39), suggesting a stronger role of dynamic events in attracting human attention. Among those, surprising events were significantly the strongest (Bonferroni-corrected t-tests for equality of KLscores between surprise and other metrics, p < 10−100).

Comment in

Similar articles

Cited by

References

    1. Abrams RA, Christ SE. Motion onset captures attention. Psychological Science. 2003;14(5):427–432. - PubMed
    1. Ackley DH, Hinton GE, Sejnowski TJ. A learning algorithm for Boltzmann machines. Cognitive Science. 1985;9:147–169.
    1. Benjamin J, Li L, Patterson C, Greenberg BD, Murphy DL, Hamer DH. Population and familial association between the D4 dopamine receptor gene and measures of Novelty seeking. Nature Genetics. 1996;12(1):81–84. - PubMed
    1. Bradley J, Bonigk W, Yau KW, Frings S. Calmodulin permanently associates with rat olfactory CNG channels under native conditions. Nature Neuroscience. 2004;7(7):705–710. - PMC - PubMed
    1. Brown LD. Fundamentals of statistical exponential families. Institute of Mathematical Statistics; Hayward, CA: 1986.

Publication types