Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov:192:103982.
doi: 10.1016/j.cognition.2019.05.019. Epub 2019 Jun 21.

Time and information in perceptual adaptation to speech

Affiliations

Time and information in perceptual adaptation to speech

Ja Young Choi et al. Cognition. 2019 Nov.

Abstract

Perceptual adaptation to a talker enables listeners to efficiently resolve the many-to-many mapping between variable speech acoustics and abstract linguistic representations. However, models of speech perception have not delved into the variety or the quantity of information necessary for successful adaptation, nor how adaptation unfolds over time. In three experiments using speeded classification of spoken words, we explored how the quantity (duration), quality (phonetic detail), and temporal continuity of talker-specific context contribute to facilitating perceptual adaptation to speech. In single- and mixed-talker conditions, listeners identified phonetically-confusable target words in isolation or preceded by carrier phrases of varying lengths and phonetic content, spoken by the same talker as the target word. Word identification was always slower in mixed-talker conditions than single-talker ones. However, interference from talker variability decreased as the duration of preceding speech increased but was not affected by the amount of preceding talker-specific phonetic information. Furthermore, efficiency gains from adaptation depended on temporal continuity between preceding speech and the target word. These results suggest that perceptual adaptation to speech may be understood via models of auditory streaming, where perceptual continuity of an auditory object (e.g., a talker) facilitates allocation of attentional resources, resulting in more efficient perceptual processing.

Keywords: Adaptation; Categorization; Phonetic variability; Speech perception; Talker normalization.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Stimuli for Experiments 1–3.
(A,B,C) Spectrograms of example stimuli produced by Speaker 2 used in Experiments 1–3 in each condition. (D) Lines indicate the F1-F2 trajectory of all carriers produced by each talker. Black points indicate the F1-F2 position of the /o/ and the /u/ vowels in the target words “boat” and “boot” spoken by each talker. Recordings of all experimental stimuli are available online: https://open.bu.edu/handle/2144/16460
Figure 2
Figure 2. Task design for all experiments.
Participants performed a speeded word identification task while listening to speech produced by either (A) a single talker or (B) mixed talkers. The short-carrier condition for Experiment 1 is shown.
Figure 3
Figure 3. Results for Experiment 1.
Effects of talker variability and context across talkers on response times. (A) Connected points show the change in response times for individual participants between the single- and mixed-talker conditions across three levels of context. Box plots in each panel show the distribution (median, interquartile range, extrema) for each variability-by-context condition. (B) The interference effect of indexical variability is shown for each level of context. The distribution of differences in response time between the mixed- and single-talker conditions is shown, scaled within participant to their response time in the single-talker condition: ((mixed – single) / single) × 100. Significant interference was observed for every level of context. The long-carrier condition showed a significantly smaller interference effect than either the no-carrier or the short-carrier condition.
Figure 4
Figure 4. Hypothesized patterns of results for Experiment 2.
Potential patterns for the interference effect of talker variability across the three experimental conditions, as predicted by the two different hypotheses about contextual effects on talker adaptation. (A) If the amount of talker-specific phonetic details in a carrier contributes more to talker adaptation than the duration of the carrier, the interference effect will be lower in the high-information carrier condition than in the low-information carrier condition. (B) If the duration of a carrier contributes more to talker adaptation than the richness of its phonetic details, the interference effect will not differ between the low- and the high-information carriers, as their durations are matched.
Figure 5
Figure 5. Results for Experiment 2.
Effects of talker variability and context on response times. (A) Connected points show the response times in the single- and mixed-talker conditions across three levels of context for individual participants. Box plots in each panel show the distribution (median, interquartile range, extrema) for each variability-by-context condition. (B) The interference effect of indexical variability is shown for each level of context. The distribution of differences in response time between the mixed- and single-talker conditions is shown, scaled within participant to their response time in the single-talker condition: ((mixed – single) / single) × 100. Significant interference was observed for every level of context. Both the low-information and the high-information carrier conditions showed a significantly smaller interference effect than the no-carrier condition. There was no significant difference in the interference effect between the low-information and high-information carrier conditions. The pattern of results is consistent with what is expected when the duration of carrier is more important factor than the amount of talker-specific phonetic details (Fig. 4B).
Figure 6
Figure 6. Hypothesized patterns of results for Experiment 3.
Potential patterns for the interference effect of talker variability across the four experimental conditions, as predicted by the two different hypotheses of the contribution of temporal continuity of context. (A) The predicted pattern from an episodic account of speech perception. Due to having the greatest time available to reactivate talker-specific memories, the long-carrier and short-carrier-with-delay conditions should have the smallest (and equal) interference effects. The short-carrier-without-delay has less time to access memories, and so should have a larger interference effect than either of the other carriers. (B) The predicted pattern from an attention/streaming model of speech perception. In contrast to the episodic account, this model predicts a greater interference effect in the short-carrier-with-delay condition than either the short-carrier-without-delay condition or the long-carrier condition. In these latter two conditions, the temporal proximity between the adapting speech and the target word should facilitate the emergence of a talker-specific auditory object and improve processing efficiency.
Figure 7
Figure 7. Results for Experiment 3.
Effects of talker variability and context across talkers on response times. (A) Connected points show the change in response times for individual participants between the single- and mixed-talker conditions across four levels of context. Box plots in each panel show the distribution (median, interquartile range, extrema) for each variability-by-context condition. (B) The interference effect of indexical variability is shown for each level of context. The distribution of differences in response time between the mixed- and single-talker conditions is shown, scaled within participant to their response time in the single-talker condition: ((mixed – single) / single) × 100. Significant interference was observed for every level of context. The duration of the carrier phrase and its temporal proximity (continuity) to the target speech both contributed to reducing the processing cost on speech perception associated with mixed talkers. This pattern of result is consistent with what the streaming/attention model predicts (Fig. 6B).

Similar articles

Cited by

References

    1. Alain C, & Arnott SR (2000). Selectively attending to auditory objects. Front. Biosci, 5, D202–D212. - PubMed
    1. Alain C, Snyder JS, He Y, & Reinke KS (2006). Changes in auditory cortex parallel rapid perceptual learning. Cerebral Cortex, 17(5), 1074–1084. - PubMed
    1. Alho K, Rinne T, Herron TJ, & Woods DL (2014). Stimulus-dependent activations and attention-related modulations in the auditory cortex: A meta-analysis of fMRI studies. Hearing Research, 307, 29–41. - PubMed
    1. Altmann CF, Henning M, Döring MK, & Kaiser J (2008). Effects of feature-selective attention on auditory pattern and location processing. NeuroImage, 41(1), 69–79. - PubMed
    1. Assmann PF, Nearey TM, & Hogan JT (1982). Vowel identification: Orthographic, perceptual, and acoustic aspects. The Journal of the Acoustical Society of America, 71(4), 975–989. - PubMed

Publication types

LinkOut - more resources