Humans can recognize spoken words with unmatched speed and accuracy. Hearing the initial portion of a word such as "formu…" is sufficient for the brain to identify "formula" from the thousands of other words that partially match. Two alternative computational accounts propose that partially matching words (1) inhibit each other until a single word is selected ("formula" inhibits "formal" by lexical competition) or (2) are used to predict upcoming speech sounds more accurately (segment prediction error is minimal after sequences like "formu…"). To distinguish these theories we taught participants novel words (e.g., "formubo") that sound like existing words ("formula") on two successive days. Computational simulations show that knowing "formubo" increases lexical competition when hearing "formu…", but reduces segment prediction error. Conversely, when the sounds in "formula" and "formubo" diverge, the reverse is observed. The time course of magnetoencephalographic brain responses in the superior temporal gyrus (STG) is uniquely consistent with a segment prediction account. We propose a predictive coding model of spoken word recognition in which STG neurons represent the difference between predicted and heard speech sounds. This prediction error signal explains the efficiency of human word recognition and simulates neural responses in auditory regions.
Copyright © 2012 Elsevier Ltd. All rights reserved.