Quality prediction of synthesized speech based on tensor structured EEG signals

PLoS One. 2018 Jun 14;13(6):e0193521. doi: 10.1371/journal.pone.0193521. eCollection 2018.

Abstract

This study investigates quality prediction methods for synthesized speech using EEG. Training a predictive model using EEG is challenging due to a small number of training trials, a low signal-to-noise ratio, and a high correlation among independent variables. When a predictive model is trained with a machine learning algorithm, the features extracted from multi-channel EEG signals are usually organized as a vector and their structures are ignored even though they are highly structured signals. This study predicts the subjective rating scores of synthesized speeches, including their overall impression, valence, and arousal, by creating tensor structured features instead of vectorized ones to exploit the structure of the features. We extracted various features to construct a tensor feature that maintained their structure. Vectorized and tensorial features were used to predict the rating scales, and the experimental result showed that prediction with tensorial features achieved the better predictive performance. Among the features, the alpha and beta bands are particularly more effective for predictions than other features, which agrees with previous neurophysiological studies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Electroencephalography*
  • Female
  • Humans
  • Male
  • Models, Neurological*
  • Predictive Value of Tests
  • Speech Acoustics*
  • Speech Perception / physiology*

Grants and funding

Part of this work was supported by JSPS KAKENHI (Grant Numbers JP17H06101 to SN, JP17K00237 to SS, and JP16K16172 to HT). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. No additional external funding received for this study.