Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 5:10:2511.
doi: 10.3389/fpsyg.2019.02511. eCollection 2019.

Characteristic Sounds Facilitate Object Search in Real-Life Scenes

Affiliations

Characteristic Sounds Facilitate Object Search in Real-Life Scenes

Daria Kvasova et al. Front Psychol. .

Abstract

Real-world events do not only provide temporally and spatially correlated information across the senses, but also semantic correspondences about object identity. Prior research has shown that object sounds can enhance detection, identification, and search performance of semantically consistent visual targets. However, these effects are always demonstrated in simple and stereotyped displays that lack ecological validity. In order to address identity-based cross-modal relationships in real-world scenarios, we designed a visual search task using complex, dynamic scenes. Participants searched for objects in video clips recorded from real-life scenes. Auditory cues, embedded in the background sounds, could be target-consistent, distracter-consistent, neutral, or just absent. We found that, in these naturalistic scenes, characteristic sounds improve visual search for task-relevant objects but fail to increase the salience of irrelevant distracters. Our findings generalize previous results on object-based cross-modal interactions with simple stimuli and shed light upon how audio-visual semantically congruent relationships play out in real-life contexts.

Keywords: attention; multisensory; natural scenes; real life; semantics; visual search.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Left picture is an example of stimuli used as a typical search array in a search experiment. Figures are randomly chosen and randomly distributed in space without any meaningful connection between them. On the right naturalistic picture some objects are the same as on the left but now they are put into a context with spatial envelope, proportionality, and variety of meaningful and functional connections between objects.
FIGURE 2
FIGURE 2
(A) Sequence of events in the experiment. The trial started with the presentation of target word for 2000 ms. The target word was followed by the auditory cue and video. Auditory cue was presented 100 ms before the video was shown (SOA = 100 ms) and lasted for 600 ms while the video lasted for 2000 ms. There was no time limit for the participant response. 200 ms after the participant had responded a new target word was presented. (B) Example of conditions. In this example of stimulus, the possible targets are a mobile phone and a car. If the target is a mobile phone, in the target-consistent condition the sound will match the target, in the distractor-consistent condition the sound will match the distractor (a car), in the neutral condition the sound will not match any object of the scene (e.g., dog barking), and in the no sound condition there will be just background noise and no auditory cue. The image is a frame of the video clip filmed by the research group.
FIGURE 3
FIGURE 3
(A) Visual search reaction times toward a target and error rates were plotted in the target consistent sounds, distracter-consistent sounds, neutral sounds, and no sound conditions. Error bars indicate the standard error. Asterisks indicate significant difference between conditions (p-value < 0.05, ∗∗p-value < 0.01). (B) Visual search accuracy toward a target and error rates were plotted in the target-consistent sounds, distracter-consistent sounds, neutral sounds, and no sound conditions. Error bars indicate the standard error. (C) False alarm rates were plotted in the conditions when sound was consistent with the cue, inconsistent with the cue, and in the no sound condition. (D) Miss rates were plotted in the target-consistent sounds, distracter-consistent sounds, neutral sounds, and no sound conditions. Error bars indicate the standard error.

Similar articles

Cited by

References

    1. Alsius A., Mottonen R., Sams M. E., Soto-Faraco S., Tiippana K. (2014). Effect of attentional load on audiovisual speech perception: evidence from ERPs. Front. Psychol. 5:727. 10.3389/fpsyg.2014.00727 - DOI - PMC - PubMed
    1. Alsius A., Navarra J., Campbell R., Soto-Faraco S. (2005). Audiovisual integration of speech falters under high attention demands. Curr. Biol. 15 839–843. 10.1016/j.cub.2005.03.046 - DOI - PubMed
    1. Biederman I., Mezzanotte R. J., Rabinowitz J. C. (1982). Scene perception: detecting and judging objects undergoing relational violations. Cogn. Psychol. 14 143–177. 10.1016/0010-0285(82)90007-x - DOI - PubMed
    1. Busse L., Roberts K. C., Crist R. E., Weissman D. H., Woldorff M. G. (2005). The spread of attention across modalities and space in a multisensory object. Proc. Natl. Acad. Sci. U.S.A. 102 18751–18756. 10.1073/pnas.0507704102 - DOI - PMC - PubMed
    1. Chen Y., Spence C. (2011). Crossmodal semantic priming by naturalistic sounds and spoken words enhances visual sensitivity. J. Exp. Psychol. 37 1554–1568. 10.1037/a0024329 - DOI - PubMed

LinkOut - more resources