Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
, 1396 (1), 39-55

Recent Advances in Exploring the Neural Underpinnings of Auditory Scene Perception

Affiliations
Review

Recent Advances in Exploring the Neural Underpinnings of Auditory Scene Perception

Joel S Snyder et al. Ann N Y Acad Sci.

Abstract

Studies of auditory scene analysis have traditionally relied on paradigms using artificial sounds-and conventional behavioral techniques-to elucidate how we perceptually segregate auditory objects or streams from each other. In the past few decades, however, there has been growing interest in uncovering the neural underpinnings of auditory segregation using human and animal neuroscience techniques, as well as computational modeling. This largely reflects the growth in the fields of cognitive neuroscience and computational neuroscience and has led to new theories of how the auditory system segregates sounds in complex arrays. The current review focuses on neural and computational studies of auditory scene perception published in the last few years. Following the progress that has been made in these studies, we describe (1) theoretical advances in our understanding of the most well-studied aspects of auditory scene perception, namely segregation of sequential patterns of sounds and concurrently presented sounds; (2) the diversification of topics and paradigms that have been investigated; and (3) how new neuroscience techniques (including invasive neurophysiology in awake humans, genotyping, and brain stimulation) have been used in this field.

Keywords: auditory scene analysis; auditory stream segregation; change deafness; concurrent sound segregation; informational masking.

Conflict of interest statement

Conflicts of Interest

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Examples of classic paradigms and newer techniques used in studies of auditory scene perception. (A) Left: schematic shows high and low notes repeating over time. Middle: a newer stimulus (stochastic figure ground (SFG)) employs a sequence of random inharmonic chords. If a subset of these tones repeats or changes slowly over time (shown in red), they pop out as a “figure.” Right: example neural response for active (blue) and passive (red) conditions listening to SFG stimuli reproduced from Ref. (B) Left: schematic of harmonic complex with a mistuned component. Middle: schematic of double harmonic complex tones (HCT) reproduced from Ref. (Fig. 1A). Two HCT stimuli (solid blue and dashed red lines) are presented simultaneously. The neural frequency response function (i.e., tuning curve) is shown in black. Right: An example rate-place neural response to concurrent harmonic stimuli reproduced from Ref. (Fig. 5a) from a recording site exhibiting phase-locked activity.
Figure 2
Figure 2
(A) Diagram of stimulus reconstruction technique adapted from Ref. . The envelope of the speech from an attended speaker is decoded from the neural response. (B,C) Figures reproduced from Ref. . The plot shows correlation coefficients between the spectrogram of single speaker and reconstructed spectrograms of speaker mixture under different attentional conditions in correct and error trials. The time course of an attentional modulation index (AMI) calculated on the basis of the correlation between reconstructed spectrograms from mixtures and the original attended speaker spectrogram. Positive values of the AMI indicate shifts towards the target, while negative values indicate shifts towards the masker.
Figure 3
Figure 3
(A) Schematic of an informational masking stimulus. It consists of a tone cloud of masker notes with tone frequencies at randomly chosen time and frequency values. A repeating target note (shown in red) sometimes stands out from the background if no masker is present within a fixed frequency distance from the target (called a spectral protection region around the target). (B,C) Figures reproduced from Ref. . (B) A surface map of BOLD response for trials where the target was detected (TD) versus not detected (TN). (C) MEG source waves averaged across subjects and hemispheres for detected (solid lines) and undetected (dashed line) target tones. The figure highlights long-latency negativity (ARN) only for detected targets.
Figure 4
Figure 4
(A) Schematic of recognizable sounds used in change deafness experiments. A set of sounds are played at the same time and after a brief delay, the same set of sounds are played with no change, or one of the sounds is changed (as shown with dog turning into phone ringing). (B) Electrical brain responses (reprinted from Ref. with permission from Elsevier) showing P3a and P3b responses that are enhanced on trials with a detected change, compared with trials with no change or a non-detected change (note positive voltage is plotted downward). (C) Topographies of the difference between detected and non-detected changes in Ref. for the P3a and P3b. (D) Schematic of bandpass noise burst patterns used in change deafness experiments, with change in second-lowest frequency pattern. (E) Electrical brain responses (reprinted from Ref. with permission from Elsevier), showing several enhanced components for detected changes.

Similar articles

See all similar articles

Cited by 4 articles

Publication types

Feedback