Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2010 Mar;14(3):119-30.
doi: 10.1016/j.tics.2010.01.003. Epub 2010 Feb 12.

Statistically Optimal Perception and Learning: From Behavior to Neural Representations

Affiliations
Free PMC article
Review

Statistically Optimal Perception and Learning: From Behavior to Neural Representations

József Fiser et al. Trends Cogn Sci. .
Free PMC article

Abstract

Human perception has recently been characterized as statistical inference based on noisy and ambiguous sensory inputs. Moreover, suitable neural representations of uncertainty have been identified that could underlie such probabilistic computations. In this review, we argue that learning an internal model of the sensory environment is another key aspect of the same statistical inference procedure and thus perception and learning need to be treated jointly. We review evidence for statistically optimal learning in humans and animals, and re-evaluate possible neural representations of uncertainty based on their potential to support statistically optimal learning. We propose that spontaneous activity can have a functional role in such representations leading to a new, sampling-based, framework of how the cortex represents information and uncertainty.

Figures

Figure I
Figure I
Visual statistical learning. (a) An inventory of visual chunks is defined as a set of two or more spatially adjacent shapes always co-occurring in scenes. (b) Sample artificial scenes composed of multiple chunks that are used in the familiarization phase. Note that there are no obvious low-level segmentation cues giving away the identity of the underlying chunks. (c) During the test phase, subjects are shown pairs of segments that are either parts of chunks or random combinations (segments on the top). The three histograms show different statistical conditions. (Top) There is a difference in co-occurrence frequency of elements between the two choices; (middle) co-occurrence is equated, but there is difference in predictability (the probability of one symbol given that the other is present) between the choices; (bottom) both co-occurrence and predictability are equated between the two choices, but the completeness statistics (what percentage of a chunk in the inventory is covered by the choice fragment) is different – one pair is a standalone chunk, the other is a part of a larger chunk. Subjects were able to use cues in any of these conditions, as indicated by the subject preferences below each panel. These observations can be accounted for by optimal probabilistic learning, but not by simpler alternatives such as pairwise associative learning (see text).
Figure I
Figure I
Two approaches to neural representations of uncertainty in the cortex. (a) Probabilistic population codes rely on a population of neurons that are tuned to the same environmental variables with different tuning curves (populations 1 and 2, colored curves). At any moment in time, the instantaneous firing rates of these neurons (populations 1 and 2, colored circles) determine a probability distribution over the represented variables (top right panel, contour lines), which is an approximation of the true distribution that needs to be represented (purple colormap). In this example, y1 and y2, are independent, but in principle, there could be a single population with neurons tuned to both y1 and y2. However, such multivariate representations require exponentially more neurons (see text and Table I). (b) In a sampling based representation, single neurons, rather than populations of neurons, correspond to each variable. Variability of the activity of neurons 1 and 2 through time represents uncertainty in environmental variables. Correlations between the variables can be naturally represented by co-variability of neural activities, thus allowing the representation of arbitrarily shaped distributions.
Figure I
Figure I
Characteristics of cortical spontaneous activity. (a) There is a significant correlation between the orientation map of the primary visual cortex of anesthetized cat (left panel), optical image patterns of spontaneous (middle panel) and visually evoked activities (right panel) (adapted with permission from [66]). (b) Correlational analysis of BOLD signals during resting state reveals networks of distant areas in the human cortex with coherent spontaneous fluctuations. There are large scale positive intrinsic correlations between the seed region PCC (yellow) and MPF (orange) and negative correlations between PCC and IPS (blue) (adapted with permission from [98]). (c) Reliably repeating spike triplets can be detected in the spontaneous firing of the rat somatosensory cortex by multielectrode recording (adapted with permission from [91]). (d) Spatial correlations in the developing awake ferret visual cortex of multielectrode recordings show a systematic pattern of emerging strong correlations across several millimeters of the cortical surface and very similar correlational patterns for dark spontaneous (solid line) and visually driven conditions (dotted and dashed lines for random noise patterns and natural movies, respectively) (adapted with permission from [64]).
Figure 1
Figure 1
Representation of uncertainty and its benefits. (a) Sensory information is inherently ambiguous. Given a two-dimensional projection on a surface (e.g. a retina), it is impossible to determine which of the three different three-dimensional wire frame objects above cast the image (adapted with permission from [96]). (b) Cue integration. Independent visual and haptic measurements (left) support to different degrees the three possible interpretations of object identity (middle). Integrating these sources of information according to their respective uncertainties provides an optimal probabilistic estimate of the correct object (right). (c) Decision-making. When the task is to choose the bag with the right size for storing an object, uncertain haptic information needs to be utilized probabilistically for optimal choice (top left). In the example shown, the utility function expresses the degree to which a combination of object and bag size is preferable: for example, if the bag is too small, the object will not fit in, if it is too large, we are wasting valuable bag space (bottom left, top right). In this case, rather than inferring the most probable object based on haptic cues and then choosing the bag optimal for that object (in the example, the small bag for the cube), the probability of each possible object needs to be weighted by its utility and the combination with the highest expected utility (R) has to be selected (in the example, the large bag has the highest expected utility). Evidence shows that human performance in cue combination and decision-making tasks is close to optimal [10,97].
Figure 2
Figure 2
The link between probabilistic inference and learning. (Top row) Developing internal models of chairs and tables. The plot shows the distribution of parameters (two-dimensional Gaussians, represented by ellipses) and object shapes for the two categories. (Middle row) Inferences about the currently viewed object based on the input and the internal model. (Bottom row) Actual sensory input. Red color code represents the probability of a particular object part being present (see color scale on top left). T1 –T4, four successive illustrative iterations of the inference –learning cycle. (T1) The interpretation of a natural scene requires combining information from the sensory input (bottom) and the internal model (top). Based on the internal models of chairs and tables, the input is interpreted with high probability (p = 0.9) as a chair with a typical size but missing crossbars (middle). (T2) The internal model of the world is updated based on the cumulative experience of previous inferences (top). The chair in T1, being a typical example of a chair, requires minimal adjustments to the internal model. Experience with more unusual instances, as for example the high chair in T2, provokes more substantial changes (T3, top). (T3) The representation of uncertainty allows to update the internal model taking into account all possible interpretations of the input. In T3, the stimulus is ambiguous as it could be interpreted as a stool, or a square table. The internal model needs to be updated by taking into account the relative probability of the two interpretations: that there exist tables with a more square shape or that some chairs miss the upper part. Since both probabilities are relatively high, both internal models will be modified substantially during learning (see the change of both ellipses). (T4) After learning, the same input as in T1 elicits different responses owing to changes in the internal model. In T4, the input is interpreted as a chair with significantly higher confidence, as experience has shown that chairs often lack the bottom crossbars.
Figure 3
Figure 3
Neural substrates of probabilistic inference and learning. (a) Functional mapping of learning and inference onto neural substrates in the cortex. (b) Probabilistic inference for natural images. (Top) A toy model of the early visual system (based on Ref. [43]). The internal model of the environment assumes that visual stimuli, x, are generated by the noisy linear superposition of two oriented features with activation levels, y1 and y2. The task of the visual system is to infer the activation levels, y1 and y2, of these features from seeing only their superposition, x. (Bottom left) The prior distribution over the activation of these features, y1 and y2, captures prior knowledge about how much they are typically (co-)activated in images experienced before. In this example, y1 and y2 are expected to be independent and sparse, which means that each feature appears rarely in visual scenes and independently of the other feature. (Bottom middle) The likelihood function represents the way the visual features are assumed to combine to form the visual input under our model of the environment. It is higher for feature combinations that are more likely to underlie the image we are seeing according to the equation on the top. (Bottom right) The goal of the visual system is to infer the posterior distribution over y1 and y2. By Bayes’ theorem, the posterior optimally combines the expectations from the prior with the evidence from the likelihood. Maximum a posteriori (MAP) estimate, used by some models [40,43,47], denoted by a + in the figure neglects uncertainty by using only the maximum value instead of the full distribution. (c) Simple demonstration of two probabilistic representational schemes. (Black curve) The probability distribution of variable y to be represented. (Red curve) Assumed distribution by the parametric representation. Only the two parameters of the distribution, the mean μ and variance σ are represented. (Blue “x”-s and bars) Samples and the histogram implied by the sampling-based representation.
Figure 4
Figure 4
Relating spontaneous activity in darkness to sampling from the prior, based on the encoding of brightness in the primary visual cortex. (a) A statistically more efficient toy model of the early visual system [47,99] (Figure 3b). An additional feature variable, b, has a multiplicative effect on other features, effectively corresponding to the overall luminance. Explaining away this information removes redundant correlations thus improving statistical efficiency. (bc) Probabilistic inference in such a model results in a luminance-invariant behavior of the other features, as observed neurally [100] as well as perceptually [101]: when the same image is presented at different global luminance levels (left), this difference is captured by the posterior distribution of the “brightness” variable, b (center), whereas the posterior for other features, such as y1 and y2, remains relatively unaffected (right). (d) In the limit of total darkness (left), the same luminance-invariant mechanism results in the posterior over y1 and y2 collapsing to the prior (right). In this case, the inferred brightness, b, is zero (center) and as b explains all of the image content, there is no constraint left for the other feature variables, y1 and y2 (the identity in a becomes 0 = 0 (y1·w1 + y2·w2), which is fulfilled for every value of y1 and y2).

Similar articles

See all similar articles

Cited by 164 articles

See all "Cited by" articles

Publication types

Feedback