Talker normalization is mediated by structured indexical information

Atten Percept Psychophys. 2020 Jul;82(5):2237-2243. doi: 10.3758/s13414-020-01971-x.

Abstract

Speech perception is challenged by indexical variability. A litany of studies on talker normalization have demonstrated that hearing multiple talkers incurs processing costs (e.g., lower accuracy, increased response time) compared to hearing a single talker. However, when reframing these studies in terms of stimulus structure, it is evident that past tests of multiple-talker (i.e., low structure) and single-talker (i.e., high structure) conditions are not representative of the graded nature of indexical variation in the environment. Here we tested the hypothesis that processing costs incurred by multiple-talker conditions would abate given increased stimulus structure. We tested this hypothesis by manipulating the degree to which talkers' voices differed acoustically (Experiment 1) and also the frequency with which talkers' voices changed (Experiment 2) in multiple-talker conditions. Listeners performed a speeded classification task for words containing vowels that varied in acoustic-phonemic ambiguity. In Experiment 1, response times progressively decreased as acoustic variability among talkers' voices decreased. In Experiment 2, blocking talkers within mixed-talker conditions led to more similar response times among single-talker and multiple-talker conditions. Neither result interacted with acoustic-phonemic ambiguity of the target vowels. Thus, the results showed that indexical structure mediated the processing costs incurred by hearing different talkers. This is consistent with the Efficient Coding Hypothesis, which proposes that sensory and perceptual processing are facilitated by stimulus structure. Defining the roles and limits of stimulus structure on speech perception is an important direction for future research.

Keywords: Categorization; Perceptual categorization and identification; Speech perception.

MeSH terms

  • Acoustics
  • Hearing
  • Humans
  • Reaction Time
  • Speech Perception*
  • Voice*