Crowdsourced Perceptual Ratings of Voice Quality in People With Parkinson's Disease Before and After Intensive Voice and Articulation Therapies: Secondary Outcome of a Randomized Controlled Trial

Tara McAllister; Christopher Nightingale; Gemma Moya-Galé; Ava Kawamura; Lorraine Olson Ramig

doi:10.1044/2023_JSLHR-22-00694

Crowdsourced Perceptual Ratings of Voice Quality in People With Parkinson's Disease Before and After Intensive Voice and Articulation Therapies: Secondary Outcome of a Randomized Controlled Trial

J Speech Lang Hear Res. 2023 May 9;66(5):1541-1562. doi: 10.1044/2023_JSLHR-22-00694. Epub 2023 Apr 14.

Authors

Tara McAllister¹, Christopher Nightingale², Gemma Moya-Galé³, Ava Kawamura⁴, Lorraine Olson Ramig^{5

6

7

8}

Affiliations

¹ New York University, NY.
² Gallaudet University, Washington, DC.
³ Long Island University, Brooklyn, NY.
⁴ Georgetown University, Washington, DC.
⁵ University of Colorado Boulder.
⁶ National Center for Voice and Speech, Denver, CO.
⁷ Columbia University, New York, NY.
⁸ LSVT Global, Inc., Tucson, AZ.

Abstract

Purpose: Limited research has examined the suitability of crowdsourced ratings to measure treatment effects in speakers with Parkinson's disease (PD), particularly for constructs such as voice quality. This study obtained measures of reliability and validity for crowdsourced listeners' ratings of voice quality in speech samples from a published study. We also investigated whether aggregated listener ratings would replicate the original study's findings of treatment effects based on the Acoustic Voice Quality Index (AVQI) measure.

Method: This study reports a secondary outcome measure of a randomized controlled trial with speakers with dysarthria associated with PD, including two active comparators (Lee Silverman Voice Treatment [LSVT LOUD] and LSVT ARTIC), an inactive comparator (untreated PD), and a healthy control group. Speech samples from three time points (pretreatment, posttreatment, and 6-month follow-up) were presented in random order for rating as "typical" or "atypical" with respect to voice quality. Untrained listeners were recruited through the Amazon Mechanical Turk crowdsourcing platform until each sample had at least 25 ratings.

Results: Intrarater reliability for tokens presented repeatedly was substantial (Cohen's κ = .65-.70), and interrater agreement significantly exceeded chance level. There was a significant correlation of moderate magnitude between the AVQI and the proportion of listeners classifying a given sample as "typical." Consistent with the original study, we found a significant interaction between group and time point, with the LSVT LOUD group alone showing significantly higher perceptually rated voice quality at posttreatment and follow-up relative to the pretreatment time point.

Conclusions: These results suggest that crowdsourcing can be a valid means to evaluate clinical speech samples, even for less familiar constructs such as voice quality. The findings also replicate the results of the study by Moya-Galé et al. (2022) and support their functional relevance by demonstrating that the effects of treatment measured acoustically in that study are perceptually apparent to everyday listeners.

Publication types

Randomized Controlled Trial
Research Support, N.I.H., Extramural

MeSH terms

Crowdsourcing*
Humans
Parkinson Disease*
Reproducibility of Results
Speech Acoustics
Treatment Outcome
Voice Quality
Voice Training

Abstract

Publication types

MeSH terms

Grants and funding