Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Apr;141(4):2882.
doi: 10.1121/1.4981118.

How many images are in an auditory scene?

Affiliations
Free PMC article

How many images are in an auditory scene?

Xuan Zhong et al. J Acoust Soc Am. 2017 Apr.
Free PMC article

Abstract

If an auditory scene consists of many spatially separated sound sources, how many sound sources can be processed by the auditory system? Experiment I determined how many speech sources could be localized simultaneously on the azimuth plane. Different words were played from multiple loudspeakers, and listeners reported the total number of sound sources and their individual locations. In experiment II the accuracy of localizing one speech source in a mixture of multiple speech sources was determined. An extra sound source was added to an existing set of sound sources, and the task was to localize that extra source. In experiment III the setup and task were the same as in experiment I, except that the sounds were tones. The results showed that the maximum number of sound sources that listeners could perceive was limited to approximately four spatially separated speech signals and three for tonal signals. The localization errors increased along with the increase of total number of sound sources. When four or more speech sources already existed, the accuracy in localizing an additional source was near chance.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Test setup for the localization tasks. Loudspeakers were numbered 1–12 with 30-degree spacing on the horizontal plane at the level of the listeners' ears. A circular loudspeaker array on the horizontal plane was used. As a result, the shorter angular path from the reported location to the actual location was used in error calculation. For instance, the angle between loudspeaker #12 and #1 was calculated as 30° (not 330°) since these two loudspeakers are separated by just one loudspeaker position.
FIG. 2.
FIG. 2.
(Left) Individual results of all eight listeners in experiment I (speech) showing the relationship between the reported and actual total number of sound sources. (Right) Mean and plus/minus one standard deviation across the eight listeners. The dotted diagonal line represents correct (ideal) responses. Vertical lines are +/− one standard deviation.
FIG. 3.
FIG. 3.
(Left): Individual results (eight listeners) in experiment I (speech) showing the relationship between proportion of correct location responses (Hits) and the actual number of sound sources. (Right) Mean and plus/minus one standard deviation across the eight listeners.
FIG. 4.
FIG. 4.
Mean and plus/minus one standard deviation (over six listeners) in experiment II: added sound source (speech), showing the relationship between localization rms error and the actual number of sound sources (chance: 104.8 deg).
FIG. 5.
FIG. 5.
Same format and calculations as in Fig. 2, but for the tonal data (experiment III). The dotted diagonal line represents correct responses.
FIG. 6.
FIG. 6.
Same format and calculations as in Fig. 3, but for the tonal data (experiment III).
FIG. 7.
FIG. 7.
(Left) The relationship between the mean and plus/minus one standard deviation of the reported number and the actual number of speech sounds (six listeners, speech stimuli) for experiment IV. Solid line and circles for spatially separated sound sources, and dashed line and triangle for all words co-located at the same loudspeaker. The dotted diagonal line represents perfect performance. (Right) Same relationships as shown in the (left) panel but for the tonal stimuli.

Similar articles

Cited by

References

    1. Blauert, J. (1997). Spatial Hearing, 2nd ed. ( MIT, Cambridge, MA: ), pp. 1–494.
    1. Bregman, A. S. (1994). Auditory Scene Analysis: The Perceptual Organization of Sound ( MIT, Cambridge, MA: ), pp. 1–792.
    1. Cherry, C. (1953). “ Some experiments on the recognition of speech with one and with two ears,” J. Acoust. Soc. Am. 25, 975–981.10.1121/1.1907229 - DOI
    1. Gardner, M. B. (1969). “ Image fusion, broadening, and displacement in sound location,” J. Acoust. Soc. Am. 46, 339–349.10.1121/1.1911695 - DOI - PubMed
    1. Grantham, D. W. , Ashmead, D. H. , Ricketts, T. A. , Labadie, R. F. , and Haynes, D. S. (2007). “ Horizontal-plane localization of noise and speech signals by postlingually deafened adults fitted with bilateral cochlear implants,” Ear Hear. 28, 524–541.10.1097/AUD.0b013e31806dc21a - DOI - PubMed

Publication types