Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May;81(4):1167-1177.
doi: 10.3758/s13414-019-01684-w.

Effects of talker continuity and speech rate on auditory working memory

Affiliations

Effects of talker continuity and speech rate on auditory working memory

Sung-Joo Lim et al. Atten Percept Psychophys. 2019 May.

Abstract

Speech processing is slower and less accurate when listeners encounter speech from multiple talkers compared to one continuous talker. However, interference from multiple talkers has been investigated only using immediate speech recognition or long-term memory recognition tasks. These tasks reveal opposite effects of speech processing time on speech recognition - while fast processing of multi-talker speech impedes immediate recognition, it also results in more abstract and less talker-specific long-term memories for speech. Here, we investigated whether and how processing multi-talker speech disrupts working memory maintenance, an intermediate stage between perceptual recognition and long-term memory. In a digit sequence recall task, listeners encoded seven-digit sequences and recalled them after a 5-s delay. Sequences were spoken by either a single talker or multiple talkers at one of three presentation rates (0-, 200-, and 500-ms inter-digit intervals). Listeners' recall was slower and less accurate for sequences spoken by multiple talkers than a single talker. Especially for the fastest presentation rate, listeners were less efficient when recalling sequences spoken by multiple talkers. Our results reveal that talker-specificity effects for speech working memory are most prominent when listeners must rapidly encode speech. These results suggest that, like immediate speech recognition, working memory for speech is susceptible to interference from variability across talkers. While many studies ascribe effects of talker variability to the need to calibrate perception to talker-specific acoustics, these results are also consistent with the idea that a sudden change of talkers disrupts attentional focus, interfering with efficient working-memory processing.

Keywords: Aditory streaming; Auditory working memory; Recall efficiency; Speech perception; Talker adaptation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Acoustic variability of the spoken digit stimuli. (A) The vowel space (first and second formants; F1, × F2; note log scale) of each stimulus. Each point indicates the mean F1 and F2 of the sonorant portion of each stimulus. The digit identity of a stimulus is marked as a number (e.g., 9 = “nine”). Orange and blue colors indicate the four female and four male talkers, respectively. For reference, the acoustic measurements of the current stimuli are situated against the canonical acoustics of the four English point vowels (circled vowels: /i/, /u/, /æ/, and /ɑ/) reported by Hillenbrand et al. (1995). (B) The distribution of vocal pitch (fundamental frequency; F0) of each talker’s digit recordings. F0 was sampled at every 15ms of the sonorant portion of each stimulus. The median, interquartile range, and extrema of the F0 of each talker are displayed.
Figure 2.
Figure 2.
Illustration of the digit sequence recall task. Participants heard digit sequences spoken either by a single talker or with each digit in the sequence spoken by a different talker (i.e., seven talkers). Each digit sequence was presented at a rate of 0-, 200-, or 500-ms inter-digit stimulus delay. After a 5-s retention period, participants recalled the digit sequence, in order, using a mouse to select digits from a visual display.
Figure 3.
Figure 3.
Average proportions of correctly recalled digits in each of 2 (talkers) × 3 (stimulus rate; ISI) conditions. (A) The mean proportion correctly recalled digits as a function of digit position. Error bars indicate the ±1 standard error of the mean (SEM) across participants. (B) Mean proportion correct digits recall by talker condition. The thin colored lines connect each individual participant’s performance in the single and multi-talker conditions. Group mean performance is indicated by the black circles connected by a bold line. (C) Dot density plots of individual participant’s differences (multi- vs. single-talker) in mean proportion correct in each ISI condition. The solid line indicates the mean difference across participants.
Figure 4.
Figure 4.
Response times of the onset of digit sequence recall by talker condition across stimulus presentation rates (ISIs). (A) Thin colored lines connect each individual participant’s performance in the single and multi-talker conditions. Group means across participants are indicated by the black circles connected by bold lines. (B) Dot density plots of individuals’ differences (multi–single-talker) in mean response time across ISIs. The solid line indicates the mean difference (multi–single-talker) across participants.
Figure 5.
Figure 5.
Efficiency of digit sequence recall by talker condition across stimulus presentation rates (ISIs). (A) Mean log-transformed efficiency score of the digit sequence recall. Data points connected by thin colored lines indicate individual participants’ performance in the single vs. multi-talker conditions in each ISI condition. Group means across participants are indicated by the black circles connected by bold lines. (B) Dot density plots of the individuals’ differences (multi–single-talker) in average recall efficiency in each ISI condition. The solid line indicates the mean difference (multi–single-talker) across participants.

Similar articles

Cited by

References

    1. Antoniou M, & Wong PCM (2015). Poor phonetic perceivers are affected by cognitive load when resolving talker variability. The Journal of the Acoustical Society of America, 138(2), 571–574. 10.1121/1.4923362 - DOI - PMC - PubMed
    1. Baddeley A (1992). Working memory. Science, 255(5044), 556–559. - PubMed
    1. Baddeley A (2003). Working memory: looking back and looking forward. Nature Reviews. Neuroscience, 4(10), 829–839. 10.1038/nrn1201 - DOI - PubMed
    1. Best V, Ozmeral EJ, Kopčo N, & Shinn-Cunningham BG (2008). Object continuity enhances selective auditory attention. Proceedings of the National Academy of Sciences, 105(35), 13174–13178. 10.1073/pnas.0803718105 - DOI - PMC - PubMed
    1. Bizley JK, & Cohen YE (2013). The what, where and how of auditory-object perception. Nature Reviews. Neuroscience, 14(10), 693–707. 10.1038/nrn3565 - DOI - PMC - PubMed

LinkOut - more resources