Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 27:7:14.
doi: 10.3389/fneng.2014.00014. eCollection 2014.

Decoding spectrotemporal features of overt and covert speech from the human cortex

Affiliations

Decoding spectrotemporal features of overt and covert speech from the human cortex

Stéphanie Martin et al. Front Neuroeng. .

Abstract

Auditory perception and auditory imagery have been shown to activate overlapping brain regions. We hypothesized that these phenomena also share a common underlying neural representation. To assess this, we used electrocorticography intracranial recordings from epileptic patients performing an out loud or a silent reading task. In these tasks, short stories scrolled across a video screen in two conditions: subjects read the same stories both aloud (overt) and silently (covert). In a control condition the subject remained in a resting state. We first built a high gamma (70-150 Hz) neural decoding model to reconstruct spectrotemporal auditory features of self-generated overt speech. We then evaluated whether this same model could reconstruct auditory speech features in the covert speech condition. Two speech models were tested: a spectrogram and a modulation-based feature space. For the overt condition, reconstruction accuracy was evaluated as the correlation between original and predicted speech features, and was significant in each subject (p < 10(-5); paired two-sample t-test). For the covert speech condition, dynamic time warping was first used to realign the covert speech reconstruction with the corresponding original speech from the overt condition. Reconstruction accuracy was then evaluated as the correlation between original and reconstructed speech features. Covert reconstruction accuracy was compared to the accuracy obtained from reconstructions in the baseline control condition. Reconstruction accuracy for the covert condition was significantly better than for the control condition (p < 0.005; paired two-sample t-test). The superior temporal gyrus, pre- and post-central gyrus provided the highest reconstruction information. The relationship between overt and covert speech reconstruction depended on anatomy. These results provide evidence that auditory representations of covert speech can be reconstructed from models that are built from an overt speech data set, supporting a partially shared neural substrate.

Keywords: covert speech; decoding model; electrocorticography; pattern recognition; speech production.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Electrode locations. Grid locations for each subject are overlaid on cortical surface reconstructions of each subject's MRI scan.
Figure 2
Figure 2
Decoding approach. (A) The overt speech condition was used to train and test the accuracy of a neural-based decoding model to reconstruct spectrotemporal features of speech. The reconstructed patterns were compared to the true original (spoken out loud) speech representation (spectrogram or modulation-based). (B) During covert speech, there is no behavioral output, which prevents building a decoding model directly from covert speech data. Instead, the decoding model trained from the overt speech condition is used to decode covert speech neural activity. The covert speech reconstructed patterns were compared to identical speech segments spoken aloud during the overt speech condition (using dynamic time warping realignment).
Figure 3
Figure 3
Speech realignment. (A) Overt speech analysis—the overall reconstruction accuracy for the overt speech condition was quantified by computing directly the correlation coefficient (Pearson's r) between the reconstructed and original speech representations (B) Covert speech analysis—the covert speech reconstruction is not necessarily aligned to the corresponding overt speech representation due to speaking rate differences and repetition irregularities. The reconstruction was thus realigned to the overt speech stimuli using dynamic time warping. The overall reconstruction accuracy was then quantified by computing the correlation coefficient (Pearson's r) between the covert speech reconstruction and the original speech representation. (C) Baseline control analysis—a resting state (baseline control) condition was used to assess statistical significance of covert speech reconstruction accuracy. Resting state activity was used to generate a noise reconstruction and dynamic time warping was applied to align the noise reconstruction to overt speech as in (B). Because dynamic time warping has substantial degrees of freedom, due to its ability to stretch and compress speech segments, the overall reconstruction accuracy for the baseline control condition is significantly higher than zero. However, direct statistical comparisons between the covert and baseline conditions are valid as equivalent analysis procedures are applied to both covert and resting state neural data.
Figure 4
Figure 4
Brain mapping and electrode localization. (A) Post-operative CT scans (1 mm slices) and (C) pre-operative structural MRI scans (1.5 mm slices, T1-weighted) were acquired for each subject. From these scans, grid position (B) and the cortical surface (D) were reconstructed providing a subject-specific anatomical model (E) (see section Coregistration for details).
Figure 5
Figure 5
Overt speech reconstruction accuracy for the spectrogram-based speech representation. (A) Overall reconstruction accuracy for each subject using the spectrogram-based speech representation. Error bars denote standard error of the mean (s.e.m.). Overall accuracy is reported as the mean over all features (32 acoustic frequencies ranging from 0.2–7 kHz). The overall spectrogram reconstruction accuracy for the overt speech was greater than baseline control reconstruction accuracy in all individuals (p < 10−5; Hotelling's t-test). Baseline control reconstruction accuracy was not significantly different from zero (p > 0.1; one-sample t-test; gray dashed line) (B) Reconstruction accuracy as a function of acoustic frequency averaged over all subjects (N = 7) using the spectrogram model. Shaded region denotes s.e.m. over subjects.
Figure 6
Figure 6
Overt speech reconstruction and identification. (A) Top panel: segment of the original sound spectrogram (subject's own voice), as well as the corresponding text above it. Bottom panel: same segment reconstructed with the decoding model. (B) Identification rank. Speech segments (5 s) were extracted from the continuous spectrogram. For each extracted segment (N = 123) a similarity score (correlation coefficient) was computed between the target reconstruction and each original spectrogram of the candidate set. The similarity scores were sorted and identification rank was quantified as the percentile rank of the correct segment. 1.0 indicates the target reconstruction matched the correct segment out of all candidate segments; 0.0 indicates the target was least similar to the correct segment among all other candidates; (dashed line indicates chance level = 0.5; median identification rank = 0.87; p < 10−5; randomization test).
Figure 7
Figure 7
Overt speech reconstruction accuracy for the modulation-based speech representation. (A) Overall reconstruction accuracy for each subject using the modulation-based speech representation. Error bars denote s.e.m. Overall accuracy is reported as the mean over all features (5 spectral and 12 temporal modulations ranging between 0.5–8 cyc/oct and -32-32 Hz, respectively). The overall modulation reconstruction accuracy for the overt speech was greater than baseline control reconstruction accuracy in all individuals (p < 10−5; Hotelling's t-test). Baseline control reconstruction accuracy was not significantly different from zero (p > 0.1; one-sample t-test; gray dashed line). (B) Reconstruction accuracy as a function of rate and scale averaged over all subjects (N = 7).
Figure 8
Figure 8
Overt speech informative areas. Reconstruction accuracy correlation coefficients were computed separately for each individual electrode and for both overt and baseline control conditions (see section Overt Speech: Informative areas for details). The plotted correlation values are calculated by subtracting the correlation during baseline control from the overt condition. The informative area map was thresholded to p < 0.05 (Bonferroni correction) (A) Spectrogram-based reconstruction accuracy (B) modulation-based reconstruction accuracy.
Figure 9
Figure 9
Overall reconstruction accuracy using dynamic time warping realignment. Overall reconstruction accuracy for each subject during overt speech, covert speech, and baseline control conditions after dynamic time warping realignment. (A) Spectrogram-based representation (B) Modulation-based representation.
Figure 10
Figure 10
Covert speech reconstruction. (A) Top panel: a segment of the overt (spoken out loud) spectrogram representation. Bottom panel: the same segment reconstructed from neural activity during the covert condition using the decoding model. (B) Identification rank. Speech segments (5 s) were extracted from the continuous spectrogram. For each target segment (N = 123) a similarity score (correlation coefficient) was computed between the target reconstruction and each original spectrogram in the candidate set. The similarity scores were sorted and identification rank was quantified as the percentile rank of the correct segment. 1.0 indicates the target reconstruction matched the correct segment out of all candidate segments; 0.0 indicates the target was least similar to the correct segment among all other candidates. (dashed line indicates chance level = 0.5; median identification rank = 0.62; p < 0.005; randomization test).
Figure 11
Figure 11
Overt and covert speech identification. Median identification rank for each subject during overt speech, covert speech, and baseline control conditions (see section Evaluation for more details). At the group level, rankovert = 0.91 and rankcovert = 0.55 are significantly higher than chance level (0.5; randomization; gray dashed line), whereas rankbaseline = 0.48 is not significantly different.
Figure 12
Figure 12
Covert speech informative areas. Reconstruction accuracy correlation coefficients were computed separately for each individual electrode and for both covert and baseline control conditions (see section Overt Speech: Informative areas and Covert Speech: Informative areas for details). The plotted correlation values are calculated by subtracting the correlation during baseline control from the covert condition. The informative area map was thresholded to p < 0.05 (Bonferroni correction) (A) Spectrogram-based reconstruction accuracy (B) modulation-based reconstruction accuracy.
Figure 13
Figure 13
Region of interest analysis of significant electrodes. Significant electrodes (either overt, covert or both; p < 0.05; Bonferroni correction) in STG, Pre- and Post-central gyrus across subjects, co-registered with the Talairach brain template (Lancaster et al., 2000), for the spectrogram-based (A) and the modulation-based (B) reconstruction.

Similar articles

Cited by

References

    1. Aleman A. (2004). The functional neuroanatomy of metrical stress evaluation of perceived and imagined spoken words. Cereb. Cortex 15, 221–228 10.1093/cercor/bhh124 - DOI - PubMed
    1. Aziz-Zadeh L., Cattaneo L., Rochat M., Rizzolatti G. (2005). Covert speech arrest induced by rTMS over both motor and nonmotor left hemisphere frontal sites. J. Cogn. Neurosci. 17, 928–938 10.1162/0898929054021157 - DOI - PubMed
    1. Basho S., Palmer E. D., Rubio M. A., Wulfeck B., Müller R.-A. (2007). Effects of generation mode in fMRI adaptations of semantic fluency: paced production and overt speech. Neuropsychologia 45, 1697–1706 10.1016/j.neuropsychologia.2007.01.007 - DOI - PMC - PubMed
    1. Bialek W., Rieke F., de Ruyter van Steveninck R., Warland D. (1991). Reading a neural code. Science 252, 1854–1857 10.1126/science.2063199 - DOI - PubMed
    1. Billingsley-Marshall R., Clear T., Mencl W. E., Simos P. G., Swank P. R., Men D., et al. (2007). A comparison of functional MRI and magnetoencephalography for receptive language mapping. J. Neurosci. Methods 161, 306–313 10.1016/j.jneumeth.2006.10.020 - DOI - PubMed