Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model

J Acoust Soc Am. 2009 Nov;126(5):2635-48. doi: 10.1121/1.3224721.


This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. "Microscopic" is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human's auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am. 100, 1703-1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model's a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers.

Publication types

  • Comparative Study

MeSH terms

  • Adult
  • Auditory Cortex
  • Female
  • Hearing
  • Humans
  • Male
  • Models, Neurological*
  • Phonetics*
  • Predictive Value of Tests
  • Psychoacoustics*
  • Speech Intelligibility
  • Speech Perception*
  • Speech Reception Threshold Test
  • Speech Recognition Software*
  • Young Adult