Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 787, 535-43

Temporal Coherence and the Streaming of Complex Sounds

Affiliations

Temporal Coherence and the Streaming of Complex Sounds

Shihab Shamma et al. Adv Exp Med Biol.

Abstract

Humans and other animals can attend to one of multiple sounds, and -follow it selectively over time. The neural underpinnings of this perceptual feat remain mysterious. Some studies have concluded that sounds are heard as separate streams when they activate well-separated populations of central auditory neurons, and that this process is largely pre-attentive. Here, we propose instead that stream formation depends primarily on temporal coherence between responses that encode various features of a sound source. Furthermore, we postulate that only when attention is directed toward a particular feature (e.g., pitch or location) do all other temporally coherent features of that source (e.g., timbre and location) become bound together as a stream that is segregated from the incoherent features of other sources. Experimental -neurophysiological evidence in support of this hypothesis will be presented. The focus, however, will be on a computational realization of this idea and a discussion of the insights learned from simulations to disentangle complex sound sources such as speech and music. The model consists of a representational stage of early and cortical auditory processing that creates a multidimensional depiction of various sound attributes such as pitch, location, and spectral resolution. The following stage computes a coherence matrix that summarizes the pair-wise correlations between all channels making up the cortical representation. Finally, the perceived segregated streams are extracted by decomposing the coherence matrix into its uncorrelated components. Questions raised by the model are discussed, especially on the role of attention in streaming and the search for further neural correlates of streaming percepts.

Figures

Fig. 59.1
Fig. 59.1
Temporal coherence model. The mixture (sum of one male and one female sentences) is transformed into an auditory spectrogram. Various features are extracted from the spectrogram including a multiscale analysis that results in a repeated representation of the spectrogram at various resolutions; pitch values and salience are represented as a pitch-gram; location signals are extracted from the interaural differences. All responses are then analyzed by temporal modulation band-pass filters tuned in the range from 2 to 16 Hz. A pair-wise correlation matrix of all channels is then computed. When attention is applied to a particular feature (e.g., female pitch channels), all features correlated with this pitch track become bound with other correlated feature channels (indicated by the dashed straight lines running through the various representations) to segregate a foreground stream (female in this example) from the remaining background streams
Fig. 59.2
Fig. 59.2
Streaming of two-tone sequences. Alternating tone sequences are perceived as two streams when tones are far apart (large ΔF) and rates are relatively fast (small ΔT). Synchronous sequences are perceived as a single stream regardless of their frequency separation. The correlation matrices induced by these two sequences are different: pair-wise correlations between the two tones (A, B) are negative for the alternating sequence and positive for the synchronous tones. Neural implementation of this correlation computation can be accomplished by a layer of neurons that adapts rapidly to become mutually inhibited when responses are anti-correlated (alternating tones) and mutually excitatory when they are coherent (synchronous tones). When selective attention (yellow arrow) is directed to one tone (B in this example), the “row” of pair-wise correlations at B (along the yellow dashed line) can be used as a mask that indicates the channels that are correlated with the B stream. For the alternating sequence, tone A is negatively correlated with B, and hence, the mask is negative at A and eliminates this tone from the attended stream. In the synchronous case, the two tones are correlated, and hence, the mask groups both tones into the attended stream
Fig. 59.3
Fig. 59.3
Behavioral neurophysiology. (Top Panels) Structure of experimental trials. Ferrets listened to ALT or SYNC tone sequences presented for 1–3 s followed by a cloud of random tones (red) used to measure the STRF of the recorded neuron. (Middle Panels) Responses change when animals begin to listen attentively and globally to all tone sequences, i.e., not selectively to one tone. The responses become enhanced for the SYNC sequences (red) and attenuated for the ALT sequences (blue). Response changes (left panel) start immediately after onset of the trial but reach a plateau after three to four tone bursts (~0.5 s). Period histograms of responses to the tones (red and blue bars in right panel) reveal that SYNC tone responses (red) become significantly enhanced, while those of ALT tones become suppressed (blue). (Bottom Panels) STRFs measured at the end of tone sequences during the passive state show very little differences (left panel). During active attentive listening, STRFs become depressed after ALT compared to SYNC tone sequences (right panel)

Similar articles

See all similar articles

Cited by 12 articles

See all "Cited by" articles

LinkOut - more resources

Feedback