Crossmodal Phase Reset and Evoked Responses Provide Complementary Mechanisms for the Influence of Visual Speech in Auditory Cortex

J Neurosci. 2020 Oct 28;40(44):8530-8542. doi: 10.1523/JNEUROSCI.0555-20.2020. Epub 2020 Oct 6.


Natural conversation is multisensory: when we can see the speaker's face, visual speech cues improve our comprehension. The neuronal mechanisms underlying this phenomenon remain unclear. The two main alternatives are visually mediated phase modulation of neuronal oscillations (excitability fluctuations) in auditory neurons and visual input-evoked responses in auditory neurons. Investigating this question using naturalistic audiovisual speech with intracranial recordings in humans of both sexes, we find evidence for both mechanisms. Remarkably, auditory cortical neurons track the temporal dynamics of purely visual speech using the phase of their slow oscillations and phase-related modulations in broadband high-frequency activity. Consistent with known perceptual enhancement effects, the visual phase reset amplifies the cortical representation of concomitant auditory speech. In contrast to this, and in line with earlier reports, visual input reduces the amplitude of evoked responses to concomitant auditory input. We interpret the combination of improved phase tracking and reduced response amplitude as evidence for more efficient and reliable stimulus processing in the presence of congruent auditory and visual speech inputs.SIGNIFICANCE STATEMENT Watching the speaker can facilitate our understanding of what is being said. The mechanisms responsible for this influence of visual cues on the processing of speech remain incompletely understood. We studied these mechanisms by recording the electrical activity of the human brain through electrodes implanted surgically inside the brain. We found that visual inputs can operate by directly activating auditory cortical areas, and also indirectly by modulating the strength of cortical responses to auditory input. Our results help to understand the mechanisms by which the brain merges auditory and visual speech into a unitary perception.

Keywords: audiovisual speech; broadband high-frequency activity; crossmodal stimuli; intracranial electroencephalography; neuronal oscillations; phase–amplitude coupling.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Auditory Cortex / physiology*
  • Drug Resistant Epilepsy / surgery
  • Electrocorticography
  • Evoked Potentials / physiology*
  • Evoked Potentials, Auditory / physiology
  • Evoked Potentials, Visual / physiology
  • Female
  • Humans
  • Middle Aged
  • Neurons / physiology
  • Nonverbal Communication / physiology*
  • Nonverbal Communication / psychology
  • Photic Stimulation
  • Young Adult