The processing of audio-visual speech: empirical and neural bases

Philos Trans R Soc Lond B Biol Sci. 2008 Mar 12;363(1493):1001-10. doi: 10.1098/rstb.2007.2155.


In this selective review, I outline a number of ways in which seeing the talker affects auditory perception of speech, including, but not confined to, the McGurk effect. To date, studies suggest that all linguistic levels are susceptible to visual influence, and that two main modes of processing can be described: a complementary mode, whereby vision provides information more efficiently than hearing for some under-specified parts of the speech stream, and a correlated mode, whereby vision partially duplicates information about dynamic articulatory patterning.Cortical correlates of seen speech suggest that at the neurological as well as the perceptual level, auditory processing of speech is affected by vision, so that 'auditory speech regions' are activated by seen speech. The processing of natural speech, whether it is heard, seen or heard and seen, activates the perisylvian language regions (left>right). It is highly probable that activation occurs in a specific order. First, superior temporal, then inferior parietal and finally inferior frontal regions (left>right) are activated. There is some differentiation of the visual input stream to the core perisylvian language system, suggesting that complementary seen speech information makes special use of the visual ventral processing stream, while for correlated visual speech, the dorsal processing stream, which is sensitive to visual movement, may be relatively more involved.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Auditory Cortex / physiology
  • Auditory Pathways / physiology*
  • Brain Mapping
  • Humans
  • Linguistics
  • Speech Perception / physiology*
  • Visual Perception / physiology*