Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug 4;12(8):e1005058.
doi: 10.1371/journal.pcbi.1005058. eCollection 2016 Aug.

The Representation of Prediction Error in Auditory Cortex

Affiliations
Free PMC article

The Representation of Prediction Error in Auditory Cortex

Jonathan Rubin et al. PLoS Comput Biol. .
Free PMC article

Abstract

To survive, organisms must extract information from the past that is relevant for their future. How this process is expressed at the neural level remains unclear. We address this problem by developing a novel approach from first principles. We show here how to generate low-complexity representations of the past that produce optimal predictions of future events. We then illustrate this framework by studying the coding of 'oddball' sequences in auditory cortex. We find that for many neurons in primary auditory cortex, trial-by-trial fluctuations of neuronal responses correlate with the theoretical prediction error calculated from the short-term past of the stimulation sequence, under constraints on the complexity of the representation of this past sequence. In some neurons, the effect of prediction error accounted for more than 50% of response variability. Reliable predictions often depended on a representation of the sequence of the last ten or more stimuli, although the representation kept only few details of that sequence.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Information flow between the organism and the environment.
The environment consists of a stationary random process. Here the environment produces a Bernoulli sequence of two stimuli. The organism perceives the sequence, summarizing the recent past by a state of a reduced representation (m), then uses m to produce a prediction for the next stimulus in the sequence. Perception is characterized by P(m|past), mapping a sequence of past observations (N = 4 in this illustration) into states m of the reduced representation. In this example, the reduced representation consists of the number of red stimuli among the last N stimuli. Prediction is characterized by P(future|m), which assigns to each state m a set of subjective expectations for the next (future) stimulus. As the number of red stimuli in the past increases, the probability assigned to a red stimulus increases and that assigned to a blue stimulus decreases. The numbers shown here correspond to the posterior probabilities for the corresponding stimulus given a uniform prior on the probability of a red stimulus.
Fig 2
Fig 2. Reduced representations through the Information Bottleneck method.
(a) Illustration of different reduced representations of past sequences in the oddball paradigm (for N = 4 stimuli). Reduced representations are depicted by the conditional probability distribution P(m|past) that maps sequences of observations into states m. (i) The topmost representation maps each possible sequence of N = 4 stimuli to unique state m, resulting in 16 states. (ii) The middle representation is a reduced version, in which sequences with the same number of occurrences of each tone are grouped together resulting in 5 different states. This is the minimal sufficient statistic, and is also the representation illustrated in Fig 1. (iii) The bottom representation with only 3 states results from further constraining the complexity (mutual information between the reduced representation and the past) to 1 bit, and is the optimal representation (with highest predictive power) with that complexity. For this representation, the mapping for past to the state of the representation is probabilistic. (b) The tradeoff between predictive power and complexity of reduced representations in the oddball paradigm for two durations of the past (N = 4 stimuli in gray; N = 10 stimuli in black). Each point along the solid curves shows the complexity (abscissa) and predictive power (ordinate) of one unique solution of the tradeoff, with maximal predictive power for the given complexity and, equivalently, minimal complexity for the given predictive power. The rightmost points correspond to the complexity and predictive power of the full representations that assign a unique state m to each and every possible sequence of N stimuli. Dashed lines connect these values with the points corresponding to the representations based on the minimal sufficient statistic (number of red stimuli). Using the minimal sufficient statistic produces representations that are less complex but provide the same predictive power as the full representations. Further constraints on the complexity result in representations that have lower predictive power. The complexity and predictive power of representation (iii) are shown explicitly on the N = 4 curve: the complexity of this representation is lower than that of the sufficient statistic, and in consequence its predictive power is lower as well. (c) Prediction errors along an oddball sequence calculated using the transformation illustrated in panel a ((i) and (ii), which produce the same prediction errors). In this example, for each stimulus, the preceding four stimuli determine a unique state m of the reduced representation; prediction error is calculated from the predictive distribution using that state m. Bar heights represent the prediction error associated with each stimulus.
Fig 3
Fig 3. Neural responses in the oddball paradigm.
(a) Two examples of oddball stimuli blocks, one with p = 10% and the other with p = 90%. Different colors (blue and red) represent the two stimuli (‘low’ and ‘high’ tones of the oddball sequences). (b) Single-neuron responses in cat primary auditory cortex (A1) to auditory oddball stimuli. Each point shows the average spike counts of one neuron to the same stimulus (either low-tone or high-tone, represented by different colors) when presented in the p = 10% block versus the same physical stimulus in the p = 90% block. Note that most dots fall above the diagonal, indicating stronger response to rare stimuli.
Fig 4
Fig 4. Tradeoff between complexity and predictive power in the oddball paradigm.
(a) Tradeoff curves calculated for different durations of the past sequences, from N = 1 (blue) to N = 50 (red). The curves corresponding to N = 4 and N = 10 are the same as those shown in Fig 2. Each curve spans complexity values from 0 to that of the minimal sufficient statistic. While more complex representations exist, they cannot have higher predictive power. Each point along the curves represents one optimal solution (achieving maximal predictive power for its complexity constraint for the corresponding duration of the past). These curves separate achievable versus non-achievable combinations of complexity and predictive power (below versus above each curve, respectively). (b) The maximal predictive power that can be achieved as a function of the past duration N, for different constraints imposed on the complexity: 0.5 bit (light gray), 1 bits (dark gray), 2 bits (black). The dashed line corresponds to the predictive power of the minimal sufficient statistic, and is therefore the maximal predictive power at each past duration. Note the diminishing returns for increases in memory duration N (i.e. predictive power does not increase much beyond N = 10), as well as for increases in complexity (i.e. predictive power does not increase much beyond a complexity of 2 bits).
Fig 5
Fig 5. Neuronal representation of prediction error in the oddball paradigm.
(a) Spike counts of single trial responses of one neuron to one of the two frequencies with which it was tested, plotted as a function of the expected prediction error for the same trial. The prediction errors were computed using optimal reduced representations at three past durations (N = 5, N = 10 and N = 15) with a complexity of 1 bit. A small amount of jitter was added to the x coordinate for visualization purposes only. Colors correspond to the different experimental blocks (p = 10% in red; p = 50% in black; p = 90% in blue). Error bars indicate the mean and the 25th and 75th percentiles for each value of the prediction error. (b) Responses of three different neurons (plotted separately for the two tones, top and bottom panels) as a function of the expected prediction error for a past duration of N = 10 stimuli at a complexity of 2 bits. Responses of the top leftmost panel belong to the same neuron shown in panel a.
Fig 6
Fig 6. Population analysis.
(a) Histogram of the best goodness-of-fit scores rmax2 over the entire population (n = 117 combinations of 68 neurons × 2 test frequencies that evoked significant responses). Significant scores (permutation test, p<0.05; see Methods) are indicated in red (n = 78/117, 67%). (b) Scatter plot of the largest fractions of explained variance achieved for the two frequencies tested with each neuron in the main analysis (n = 68). Neurons with significant rmax2 in both tested frequencies are indicated in red while neurons with significant rmax2 in only one frequency are indicated in gray. (c) Two-dimensional, color-coded population analysis histograms of the complexity and predictive power underlying the ‘good representations’ (i.e., reduced representations that achieved at least 90% of rmax2). Analysis was performed over combinations of neuron and frequency with rmax20.1 (n = 34/117). Color scale (from blue to red) represents the fraction of neurons for which that combination was ‘good’. An abundance of 100% (red color) means that this combination of parameters was in the ‘good’ set of parameters for each and every neuron in the analyzed population. (d) Same analysis as in panel c, results are shown as a function of past duration and complexity. The 50% contour in the plot is marked by the white line. (e) Same analysis as in panel c, results are shown as a function of past duration and predictive power. Panels d and e use the same color-code as panel c.
Fig 7
Fig 7. Controls and extensions of the main analysis.
(a) Comparison of the best goodness-of-fit scores achieved for each combination of neuron and test frequency that had significant response in the ‘equiprobable’ block (p = 50%) and the goodness-of-fit score of the same neuron and frequency in the main analysis (using all responses with p = 10%, 50% and 90%). Significant scores in the equiprobable block are indicated in red (permutation test, p<0.05). (b) Fraction of cases with significant effects of prediction error on the neuronal responses for all the single-block analyses. (c) Fraction of cases with significant effects of prediction error on the neuronal responses in different time windows (marked below the histogram) and for different selection of blocks. (d) Comparison of the fraction of explained variance (rmax2) and the fraction of explainable variance. The red points corresponds to correction by a noise estimates using unbiased variances, while the green points correspond to the conservative correction by the (smaller) noise estimates using the biased variances. (e) Comparison of SSA indices (SI) and explained variance (rmax2). Cases with significant correlation (permutation test, p<0.05) are indicated by blue (low tones) or red (high tones).

Similar articles

Cited by

References

    1. Fairhall AL, Lewen GD, Bialek W, de Ruyter Van Steveninck RR. Efficiency and ambiguity in an adaptive neural code. Nature. 2001;412(6849):787–92. - PubMed
    1. Rieke F, Bodnar DA, Bialek W. Naturalistic stimuli increase the rate and efficiency of information transmission by primary auditory afferents. Proceedings Biological sciences / The Royal Society. 1995;262(1365):259–65. Epub 1995/12/22. - PubMed
    1. Laughlin S. A simple coding procedure enhances a neuron's information capacity. Zeitschrift fur Naturforschung Section C: Biosciences. 1981;36(9–10):910–2. Epub 1981/09/01. - PubMed
    1. Grimm S, Escera C. Auditory deviance detection revisited: evidence for a hierarchical novelty system. Int J Psychophysiol. 2012;85(1):88–92. 10.1016/j.ijpsycho.2011.05.012 - DOI - PubMed
    1. Naatanen R, Tervaniemi M, Sussman E, Paavilainen P, Winkler I . "Primitive intelligence" in the auditory cortex. Trends in neurosciences. 2001;24(5):283–8. - PubMed

Publication types

Grants and funding

This work was supported by grants from the Israel Science Foundation (ISF), the US-Israel Binational Science Foundation (BSF), and the German-Israeli Foundation (GIF) to IN; by a F.I.R.S.T. grant and by the DARPA MSEE project support to NT; and by the Gatsby Charitable Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.