Can visual capture of sound separate auditory streams?

Exp Brain Res. 2022 Mar;240(3):813-824. doi: 10.1007/s00221-021-06281-8. Epub 2022 Jan 20.


In noisy contexts, sound discrimination improves when the auditory sources are separated in space. This phenomenon, named Spatial Release from Masking (SRM), arises from the interaction between the auditory information reaching the ear and spatial attention resources. To examine the relative contribution of these two factors, we exploited an audio-visual illusion in a hearing-in-noise task to create conditions in which the initial stimulation to the ears is held constant, while the perceived separation between speech and masker is changed illusorily (visual capture of sound). In two experiments, we asked participants to identify a string of five digits pronounced by a female voice, embedded in either energetic (Experiment 1) or informational (Experiment 2) noise, before reporting the perceived location of the heard digits. Critically, the distance between target digits and masking noise was manipulated both physically (from 22.5 to 75.0 degrees) and illusorily, by pairing target sounds with visual stimuli either at same (audio-visual congruent) or different positions (15 degrees offset, leftward or rightward: audio-visual incongruent). The proportion of correctly reported digits increased with the physical separation between the target and masker, as expected from SRM. However, despite effective visual capture of sounds, performance was not modulated by illusory changes of target sound position. Our results are compatible with a limited role of central factors in the SRM phenomenon, at least in our experimental setting. Moreover, they add to the controversial literature on the limited effects of audio-visual capture in auditory stream separation.

Keywords: Hearing in noise; Sound localization; Spatial release from masking; Visual capture of sound.

MeSH terms

  • Acoustic Stimulation
  • Female
  • Hearing
  • Humans
  • Noise
  • Perceptual Masking*
  • Speech
  • Speech Perception*