Crowdsourced Perceptual Ratings of Voice Quality in People With Parkinson's Disease Before and After Intensive Voice and Articulation Therapies: Secondary Outcome of a Randomized Controlled Trial

J Speech Lang Hear Res. 2023 May 9;66(5):1541-1562. doi: 10.1044/2023_JSLHR-22-00694. Epub 2023 Apr 14.

Abstract

Purpose: Limited research has examined the suitability of crowdsourced ratings to measure treatment effects in speakers with Parkinson's disease (PD), particularly for constructs such as voice quality. This study obtained measures of reliability and validity for crowdsourced listeners' ratings of voice quality in speech samples from a published study. We also investigated whether aggregated listener ratings would replicate the original study's findings of treatment effects based on the Acoustic Voice Quality Index (AVQI) measure.

Method: This study reports a secondary outcome measure of a randomized controlled trial with speakers with dysarthria associated with PD, including two active comparators (Lee Silverman Voice Treatment [LSVT LOUD] and LSVT ARTIC), an inactive comparator (untreated PD), and a healthy control group. Speech samples from three time points (pretreatment, posttreatment, and 6-month follow-up) were presented in random order for rating as "typical" or "atypical" with respect to voice quality. Untrained listeners were recruited through the Amazon Mechanical Turk crowdsourcing platform until each sample had at least 25 ratings.

Results: Intrarater reliability for tokens presented repeatedly was substantial (Cohen's κ = .65-.70), and interrater agreement significantly exceeded chance level. There was a significant correlation of moderate magnitude between the AVQI and the proportion of listeners classifying a given sample as "typical." Consistent with the original study, we found a significant interaction between group and time point, with the LSVT LOUD group alone showing significantly higher perceptually rated voice quality at posttreatment and follow-up relative to the pretreatment time point.

Conclusions: These results suggest that crowdsourcing can be a valid means to evaluate clinical speech samples, even for less familiar constructs such as voice quality. The findings also replicate the results of the study by Moya-Galé et al. (2022) and support their functional relevance by demonstrating that the effects of treatment measured acoustically in that study are perceptually apparent to everyday listeners.

Publication types

  • Randomized Controlled Trial
  • Research Support, N.I.H., Extramural

MeSH terms

  • Crowdsourcing*
  • Humans
  • Parkinson Disease*
  • Reproducibility of Results
  • Speech Acoustics
  • Treatment Outcome
  • Voice Quality
  • Voice Training