Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May 11:6:25803.
doi: 10.1038/srep25803.

Word pair classification during imagined speech using direct brain recordings

Affiliations

Word pair classification during imagined speech using direct brain recordings

Stephanie Martin et al. Sci Rep. .

Erratum in

Abstract

People that cannot communicate due to neurological disorders would benefit from an internal speech decoder. Here, we showed the ability to classify individual words during imagined speech from electrocorticographic signals. In a word imagery task, we used high gamma (70-150 Hz) time features with a support vector machine model to classify individual words from a pair of words. To account for temporal irregularities during speech production, we introduced a non-linear time alignment into the SVM kernel. Classification accuracy reached 88% in a two-class classification framework (50% chance level), and average classification accuracy across fifteen word-pairs was significant across five subjects (mean = 58%; p < 0.05). We also compared classification accuracy between imagined speech, overt speech and listening. As predicted, higher classification accuracy was obtained in the listening and overt speech conditions (mean = 89% and 86%, respectively; p < 0.0001), where speech stimuli were directly presented. The results provide evidence for a neural representation for imagined words in the temporal lobe, frontal lobe and sensorimotor cortex, consistent with previous findings in speech perception and production. These data represent a proof of concept study for basic decoding of speech imagery, and delineate a number of key challenges to usage of speech imagery neural representations for clinical applications.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Experimental paradigm.
Subjects were presented with an auditory stimulus that indicated one of six individual words (average length = 800 ms ± 20). Then, a cue appeared on the screen [describe what the cue is and where it appeared on the screen], and subjects had to imagined hearing the word they had just listened to. Finally, a second cue appeared, and subjects had to say the word out loud. Shaded areas represent the intervals extracted for classification. For both listening and overt speech condition, we extracted epochs from 100 ms before speech onset to 100 ms after speech offset. For the imagined speech condition, we extracted fixed length 1.5 sec epochs starting at cue onset, since there was no speech output.
Figure 2
Figure 2. High gamma time course.
(a) High gamma neural activity averaged across trials and z-scored with respect to the pre-auditory stimuli baseline condition (500 ms interval). The top-most plot displays the designed task, an example of averaged time course for a representative electrode and the averaged audio envelope (red line). (b) For the given electrodes and conditions (listening, imagined and overt speech), examples of individual trials (black) and their corresponding audio recording (red) for three different words (‘battlefield’, ‘swimming’ and ‘telephone’).
Figure 3
Figure 3. Neural time course alignment.
(a) For each electrode separately, we extracted the high gamma time features. (b) We used dynamic time warping to realign the time series of each pair of trials, and (c) computed the DTW-distance between the pairwise realigned trials. (d) This gave rise to one similarity matrix per electrode (channel-specific kernel) that reflects how similar trial-pairs are after realignment. From the similarity matrix in d, we computed the discriminative power index (see Materials and methods for details). (e) The final kernel was computed as the weighted average of the individual kernels over all electrodes, based on their discriminative power index.
Figure 4
Figure 4. Classification accuracy.
(a) Pairwise classification accuracy in the testing set for the listening (left panel), overt speech (middle panel) and imagined speech condition (right panel) for a subject with good temporal coverage (S4). (b) Average classification accuracy across all pairs of words for each subject and condition (listening, overt and imagined speech). Error bars denote SEM.
Figure 5
Figure 5. Discriminative information.
(a) Discriminative power measured as the areas under the ROC curve (thresholded at p < 0.05; uncorrected; see Materials and methods for details), and plotted on each individual’s brain. Each is scaled to the maximum absolute value of discriminative power index (indicated by the number above each cortical map). (b) Average classification accuracy across all pairs of words for each subject using only temporal electrodes for the listening (top panel), overt speech (middle panel) and imagined speech (bottom panel). Error bars denote SEM.

Similar articles

Cited by

References

    1. Smith E. Locked-in syndrome. BMJ 330, 406–409 (2005). - PMC - PubMed
    1. Yetkin F. Z. et al.. A comparison of functional MR activation patterns during silent and audible language tasks. AJNR Am. J. Neuroradiol. 16, 1087–1092 (1995). - PMC - PubMed
    1. McGuire P. K. et al.. Functional anatomy of inner speech and auditory verbal imagery. Psychol. Med. 26, 29–38 (1996). - PubMed
    1. Palmer E. D. et al.. An Event-Related fMRI Study of Overt and Covert Word Stem Completion. NeuroImage 14, 182–193 (2001). - PubMed
    1. Shergill S. S. et al.. A functional study of auditory verbal imagery. Psychol. Med. 31, 241–253 (2001). - PubMed

Publication types