Speech comprehension in natural soundscapes rests on the ability of the auditory system to extract speech information from a complex acoustic signal with overlapping contributions from many sound sources. Here we reveal the canonical processing of speech in natural soundscapes on multiple scales by using data-driven modeling approaches to characterize sounds to analyze ultra high field fMRI recorded while participants listened to the audio soundtrack of a movie. We show that at the functional level the neuronal processing of speech in natural soundscapes can be surprisingly low dimensional in the human cortex, highlighting the functional efficiency of the auditory system for a seemingly complex task. Particularly, we find that a model comprising three functional dimensions of auditory processing in the temporal lobes is shared across participants' fMRI activity. We further demonstrate that the three functional dimensions are implemented in anatomically overlapping networks that process different aspects of speech in natural soundscapes. One is most sensitive to complex auditory features present in speech, another to complex auditory features and fast temporal modulations, that are not specific to speech, and one codes mainly sound level. These results were derived with few a-priori assumptions and provide a detailed and computationally reproducible account of the cortical activity in the temporal lobe elicited by the processing of speech in natural soundscapes.
Keywords: Auditory; Encoding models; Natural soundscapes; Speech processing; Ultra high field fMRI; Unsupervised learning.
Copyright © 2021. Published by Elsevier Inc.