This study was motivated by the prospective role played by brain rhythms in speech perception. The intelligibility - in terms of word error rate - of natural-sounding, synthetically generated sentences was measured using a paradigm that alters speech-energy rhythm over a range of frequencies. The material comprised 96 semantically unpredictable sentences, each approximately 2 s long (6-8 words per sentence), generated by a high-quality text-to-speech (TTS) synthesis engine. The TTS waveform was time-compressed by a factor of 3, creating a signal with a syllable rhythm three times faster than the original, and whose intelligibility is poor (<50% words correct). A waveform with an artificial rhythm was produced by automatically segmenting the time-compressed waveform into consecutive 40-ms fragments, each followed by a silent interval. The parameters varied were the length of the silent interval (0-160 ms) and whether the lengths of silence were equal ('periodic') or not ('aperiodic'). The performance curve (word error rate as a function of mean duration of silence) was U-shaped. The lowest word error rate (i.e., highest intelligibility) occurred when the silence was 80 ms long and inserted periodically. This is also the condition for which word error rate increased when the silence was inserted aperiodically. These data are consistent with a model (TEMPO) in which low-frequency brain rhythms affect the ability to decode the speech signal. In TEMPO, optimum intelligibility is achieved when the syllable rhythm is within the range of the high theta-frequency brain rhythms (6-12 Hz), comparable to the rate at which segments and syllables are articulated in conversational speech.
(c) 2009 S. Karger AG, Basel.