Some behavioral and neurobiological constraints on theories of audiovisual speech integration: a review and suggestions for new directions

Seeing Perceiving. 2011;24(6):513-39. doi: 10.1163/187847611X595864. Epub 2011 Sep 29.

Abstract

Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield's feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration.

Publication types

  • Research Support, N.I.H., Extramural
  • Review

MeSH terms

  • Behavior
  • Humans
  • Models, Neurological*
  • Phonetics
  • Speech / physiology*
  • Speech Perception / physiology*
  • Visual Perception / physiology*
  • Vocal Cords / physiology