Towards reconstructing intelligible speech from the human auditory cortex
- PMID: 30696881
- PMCID: PMC6351601
- DOI: 10.1038/s41598-018-37359-z
Towards reconstructing intelligible speech from the human auditory cortex
Abstract
Auditory stimulus reconstruction is a technique that finds the best approximation of the acoustic stimulus from the population of evoked neural activity. Reconstructing speech from the human auditory cortex creates the possibility of a speech neuroprosthetic to establish a direct communication with the brain and has been shown to be possible in both overt and covert conditions. However, the low quality of the reconstructed speech has severely limited the utility of this method for brain-computer interface (BCI) applications. To advance the state-of-the-art in speech neuroprosthesis, we combined the recent advances in deep learning with the latest innovations in speech synthesis technologies to reconstruct closed-set intelligible speech from the human auditory cortex. We investigated the dependence of reconstruction accuracy on linear and nonlinear (deep neural network) regression methods and the acoustic representation that is used as the target of reconstruction, including auditory spectrogram and speech synthesis parameters. In addition, we compared the reconstruction accuracy from low and high neural frequency ranges. Our results show that a deep neural network model that directly estimates the parameters of a speech synthesizer from all neural frequencies achieves the highest subjective and objective scores on a digit recognition task, improving the intelligibility by 65% over the baseline method which used linear regression to reconstruct the auditory spectrogram. These results demonstrate the efficacy of deep learning and speech synthesis algorithms for designing the next generation of speech BCI systems, which not only can restore communications for paralyzed patients but also have the potential to transform human-computer interaction technologies.
Conflict of interest statement
The authors declare no competing interests.
Figures
Similar articles
-
Transient and sustained cortical activity elicited by connected speech of varying intelligibility.BMC Neurosci. 2012 Dec 31;13:157. doi: 10.1186/1471-2202-13-157. BMC Neurosci. 2012. PMID: 23276297 Free PMC article.
-
Sleep Disrupts High-Level Speech Parsing Despite Significant Basic Auditory Processing.J Neurosci. 2017 Aug 9;37(32):7772-7781. doi: 10.1523/JNEUROSCI.0168-17.2017. Epub 2017 Jun 16. J Neurosci. 2017. PMID: 28626013 Free PMC article.
-
Identification of a pathway for intelligible speech in the left temporal lobe.Brain. 2000 Dec;123 Pt 12(Pt 12):2400-6. doi: 10.1093/brain/123.12.2400. Brain. 2000. PMID: 11099443 Free PMC article. Clinical Trial.
-
Acoustic-phonetic approach toward understanding neural processes and speech perception.J Am Acad Audiol. 1999 Jun;10(6):304-18. J Am Acad Audiol. 1999. PMID: 10385873 Review.
-
Neural tracking as a diagnostic tool to assess the auditory pathway.Hear Res. 2022 Dec;426:108607. doi: 10.1016/j.heares.2022.108607. Epub 2022 Sep 14. Hear Res. 2022. PMID: 36137861 Review.
Cited by
-
Historical perspectives, challenges, and future directions of implantable brain-computer interfaces for sensorimotor applications.Bioelectron Med. 2021 Sep 22;7(1):14. doi: 10.1186/s42234-021-00076-6. Bioelectron Med. 2021. PMID: 34548098 Free PMC article. Review.
-
A comparison of EEG encoding models using audiovisual stimuli and their unimodal counterparts.PLoS Comput Biol. 2024 Sep 9;20(9):e1012433. doi: 10.1371/journal.pcbi.1012433. eCollection 2024 Sep. PLoS Comput Biol. 2024. PMID: 39250485 Free PMC article.
-
Speech synthesis from neural decoding of spoken sentences.Nature. 2019 Apr;568(7753):493-498. doi: 10.1038/s41586-019-1119-1. Epub 2019 Apr 24. Nature. 2019. PMID: 31019317 Free PMC article.
-
Brain implants that let you speak your mind.Nature. 2019 Apr;568(7753):466-467. doi: 10.1038/d41586-019-01181-y. Nature. 2019. PMID: 31019323 No abstract available.
-
Music can be reconstructed from human auditory cortex activity using nonlinear decoding models.PLoS Biol. 2023 Aug 15;21(8):e3002176. doi: 10.1371/journal.pbio.3002176. eCollection 2023 Aug. PLoS Biol. 2023. PMID: 37582062 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
