TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library

Sean P Kinahan; Julie M Liss; Visar Berisha

doi:10.1371/journal.pone.0281306

TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library

PLoS One. 2023 Feb 17;18(2):e0281306. doi: 10.1371/journal.pone.0281306. eCollection 2023.

Authors

Sean P Kinahan^{1

2}, Julie M Liss¹, Visar Berisha^{1

2}

Affiliations

¹ College of Health Solutions, Arizona State University, Tempe, Arizona, United States of America.
² School of Electrical, Computer, and Energy Engineering, Arizona State University, Tempe, Arizona, United States of America.

Abstract

The DIVA model is a computational model of speech motor control that combines a simulation of the brain regions responsible for speech production with a model of the human vocal tract. The model is currently implemented in Matlab Simulink; however, this is less than ideal as most of the development in speech technology research is done in Python. This means there is a wealth of machine learning tools which are freely available in the Python ecosystem that cannot be easily integrated with DIVA. We present TorchDIVA, a full rebuild of DIVA in Python using PyTorch tensors. DIVA source code was directly translated from Matlab to Python, and built-in Simulink signal blocks were implemented from scratch. After implementation, the accuracy of each module was evaluated via systematic block-by-block validation. The TorchDIVA model is shown to produce outputs that closely match those of the original DIVA model, with a negligible difference between the two. We additionally present an example of the extensibility of TorchDIVA as a research platform. Speech quality enhancement in TorchDIVA is achieved through an integration with an existing PyTorch generative vocoder called DiffWave. A modified DiffWave mel-spectrum upsampler was trained on human speech waveforms and conditioned on the TorchDIVA speech production. The results indicate improved speech quality metrics in the DiffWave-enhanced output as compared to the baseline. This enhancement would have been difficult or impossible to accomplish in the original Matlab implementation. This proof-of-concept demonstrates the value TorchDIVA can bring to the research community. Researchers can download the new implementation at: https://github.com/skinahan/DIVA_PyTorch.

Copyright: © 2023 Kinahan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Computer Simulation
Ecosystem*
Humans
Machine Learning
Software
Speech*

Grants and funding

R01 DC006859/DC/NIDCD NIH HHS/United States