Decoding articulatory and phonetic components of naturalistic continuous speech from the distributed language network

Tessy M Thomas; Aditya Singh; Latané P Bullock; Daniel Liang; Cale W Morse; Xavier Scherschligt; John P Seymour; Nitin Tandon

doi:10.1088/1741-2552/ace9fb

Decoding articulatory and phonetic components of naturalistic continuous speech from the distributed language network

J Neural Eng. 2023 Aug 14;20(4). doi: 10.1088/1741-2552/ace9fb.

Authors

Tessy M Thomas^{1

2}, Aditya Singh^{1

2}, Latané P Bullock^{1

2}, Daniel Liang³, Cale W Morse^{1

2}, Xavier Scherschligt^{1

2}, John P Seymour^{1

2

4}, Nitin Tandon^{1

2

5}

Affiliations

¹ Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America.
² Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America.
³ Department of Computer Science, Rice University, Houston, TX 77005, United States of America.
⁴ Department of Electrical & Computer Engineering, Rice University, Houston, TX 77005, United States of America.
⁵ Memorial Hermann Hospital, Texas Medical Center, Houston, TX 77030, United States of America.

PMID: 37487487
DOI: 10.1088/1741-2552/ace9fb

Abstract

Objective.The speech production network relies on a widely distributed brain network. However, research and development of speech brain-computer interfaces (speech-BCIs) has typically focused on decoding speech only from superficial subregions readily accessible by subdural grid arrays-typically placed over the sensorimotor cortex. Alternatively, the technique of stereo-electroencephalography (sEEG) enables access to distributed brain regions using multiple depth electrodes with lower surgical risks, especially in patients with brain injuries resulting in aphasia and other speech disorders.Approach.To investigate the decoding potential of widespread electrode coverage in multiple cortical sites, we used a naturalistic continuous speech production task. We obtained neural recordings using sEEG from eight participants while they read aloud sentences. We trained linear classifiers to decode distinct speech components (articulatory components and phonemes) solely based on broadband gamma activity and evaluated the decoding performance using nested five-fold cross-validation.Main Results.We achieved an average classification accuracy of 18.7% across 9 places of articulation (e.g. bilabials, palatals), 26.5% across 5 manner of articulation (MOA) labels (e.g. affricates, fricatives), and 4.81% across 38 phonemes. The highest classification accuracies achieved with a single large dataset were 26.3% for place of articulation, 35.7% for MOA, and 9.88% for phonemes. Electrodes that contributed high decoding power were distributed across multiple sulcal and gyral sites in both dominant and non-dominant hemispheres, including ventral sensorimotor, inferior frontal, superior temporal, and fusiform cortices. Rather than finding a distinct cortical locus for each speech component, we observed neural correlates of both articulatory and phonetic components in multiple hubs of a widespread language production network.Significance.These results reveal the distributed cortical representations whose activity can enable decoding speech components during continuous speech through the use of this minimally invasive recording method, elucidating language neurobiology and neural targets for future speech-BCIs.

Keywords: brain–computer interfaces; intracranial; speech decoding; stereo-electroencephalography.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Brain-Computer Interfaces*
Electroencephalography / methods
Humans
Language
Phonetics
Sensorimotor Cortex*
Speech