Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun;18(6):903-11.
doi: 10.1038/nn.4021. Epub 2015 May 18.

The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts

Affiliations

The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts

Tobias Overath et al. Nat Neurosci. 2015 Jun.

Abstract

Speech contains temporal structure that the brain must analyze to enable linguistic processing. To investigate the neural basis of this analysis, we used sound quilts, stimuli constructed by shuffling segments of a natural sound, approximately preserving its properties on short timescales while disrupting them on longer scales. We generated quilts from foreign speech to eliminate language cues and manipulated the extent of natural acoustic structure by varying the segment length. Using functional magnetic resonance imaging, we identified bilateral regions of the superior temporal sulcus (STS) whose responses varied with segment length. This effect was absent in primary auditory cortex and did not occur for quilts made from other natural sounds or acoustically matched synthetic sounds, suggesting tuning to speech-specific spectrotemporal structure. When examined parametrically, the STS response increased with segment length up to ∼500 ms. Our results identify a locus of speech analysis in human auditory cortex that is distinct from lexical, semantic or syntactic processes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of the quilting algorithm and example stimuli. (a) Quilting algorithm. A source signal is divided into equal-length segments (ranging from 30 to 960 ms). Segments are then reordered subject only to the constraint that they best match the segment-to-segment changes in the cochleogram of the source signal. Segment-to-segment changes were calculated from the 30-ms sections at the borders of each pair of segments, indicated by the dashed lines. In the equation defining the segment-to-segment change, CnR(t,f) and CnL(t,f) denote the cochleogram value at time t and frequency f of the right and the left border of the nth segment, respectively. (b) Example cochleograms of quilts made from 30- and 960-ms segments, from each of four source signals: German speech, a modulation-matched control signal, a co-modulation–matched control signal and noise-vocoded German speech. Quilts of long and short segments were not markedly different in visual appearance, but sound notably distinct in all cases.
Figure 2
Figure 2
Extent and location of ROIs. (a) Anatomical (HG and PT, red and blue, respectively) and functional (green) group ROIs displayed on coronal cross-sections of our participants’ average structural images (y = −38, −30, −22, −14, −6, 2, 10). The functional group ROI was derived from the functional localizer contrast [L960 > L30], P < 0.0001, uncorrected. (b) Renderings on flattened surfaces for the three group ROIs from a (top left) and for individual functional ROIs for five participants who were scanned four times (rendered on their flattened structural images), P < 0.05, family-wise error (FWE) corrected.
Figure 3
Figure 3
Responses to German speech quilts as a function of segment length, in four ROIs: HG (red), PT (blue), group fROI (green) and individual fROIs (black), shown separately for the two hemispheres. Data are averaged across 15 unique participants. Error bars denote ±1 s.e.m., asterisks denote significant pair-wise comparisons (after Bonferroni correction), P < 0.05. Responses were normalized in each ROI to the response of the independent functional localizer condition L960.
Figure 4
Figure 4
Responses to modulation control stimuli. (a) Power in a set of simulated modulation filters (normalized by the total power over all filters) for speech quilts and modulation control quilts. Note the differences across segment lengths and the similarity across quilt type. (b) Average responses (±s.e.m.) in HG (red), PT (blue) and the individual fROI (black) to speech quilts (solid) and control quilts (dashed) with segment durations of 30 and 960 ms. Data were averaged across the nine participants who were scanned with the modulation-control condition set. (c) Cross-channel correlations for speech, modulation control and co-modulation control quilts (measured from 960-ms quilts). (d) Average responses (±s.e.m.) in HG (red), PT (blue) and the individual fROI (black) to speech quilts (solid) and co-modulation control quilts (dashed) with segment durations of 30 and 960 ms. Data are averaged across the five participants who were scanned with the co-modulation–control condition set.
Figure 5
Figure 5
Responses to environmental sound quilts. Average responses (±s.e.m.) in HG (red), PT (blue) and the individual fROI (black) to speech quilts (solid) and environmental sound quilts (dashed) with segment durations of 30 and 960 ms. Data are averaged across the five participants who were scanned with the environmental sound control condition set.
Figure 6
Figure 6
Responses to noise-vocoded speech quilts. Average responses (±s.e.m.) in HG (red), PT (blue) and the individual fROI (black) to speech quilts (solid) and noise-vocoded quilts (dashed) with segment durations of 30 and 960 ms. Data are averaged across the five participants who were scanned with the noise-vocoded control condition set.
Figure 7
Figure 7
Naturalness ratings and responses to compressed speech quilts. (a) Average ratings of the naturalness of quilted and unquilted speech stimuli for quilts made from uncompressed (blue) and compressed (red) speech (16 participants). Quilts from both quilt types were intermixed in a single block. Rated naturalness (±s.e.m.) increased monotonically with segment length for both quilt types, but the compressed speech quilts were overall rated as less natural. Naturalness ratings did not plateau at 480 ms for either quilt type. ‘Orig’ denotes excerpts of unquilted speech, which were included in the behavioral experiment for completeness. (b) Comparison of responses to quilts made from uncompressed and compressed speech. Solid lines plot average responses (±s.e.m.) to quilts of different segment lengths generated from either uncompressed (blue) or compressed (red) speech. Dashed lines plot piecewise linear function fits to the BOLD response. Black lines denote median and 95% confidence intervals on the elbow points of the fit functions for compressed and uncompressed speech, derived from bootstrap.
Figure 8
Figure 8
Functional ROIs revealed by a parcellation algorithm. (a) The five color-coded fROIs were rendered onto SPM’s smoothed surface template (top) and on individual axial slices (bottom) of our participants’ average structural images. Mean [x, y, z] Montreal Neurological Institute (MNI) voxel coordinates for each parcel were: [−58, −3, −6] (parcel 1); [−61, −22, 0] (parcel 2); [56, −26, −1] (parcel 3); [−59, −37, 6] (parcel 4); [59, 1, −10] (parcel 5). (b) The average response (±s.e.m.) to the six different segment length conditions (normalized with respect to the L960 localizer condition in a parcel) was plotted for each of the five fROIs (n = 15). The response pattern was similar across fROIs.

Similar articles

Cited by

References

    1. Stevens KN. Acoustic Phonetics. MIT Press; 2000. - PubMed
    1. Poeppel D, Idsardi WJ, van Wassenhove V. Speech perception at the interface of neurobiology and linguistics. Phil. Trans. R. Soc. Lond. B. 2008;363:1071–1086. - PMC - PubMed
    1. Scott SK, Blank CC, Rosen S, Wise RJ. Identifcation of a pathway for intelligible speech in the left temporal lobe. Brain. 2000;123:2400–2406. - PMC - PubMed
    1. Hickok G, Poeppel D. The cortical organization of speech processing. Nat. Rev. Neurosci. 2007;8:393–402. - PubMed
    1. Rauschecker JP, Scott SK. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat. Neurosci. 2009;12:718–724. - PMC - PubMed

Publication types