Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 6;13(18):2870.
doi: 10.3390/diagnostics13182870.

Harnessing Machine Learning in Vocal Arts Medicine: A Random Forest Application for "Fach" Classification in Opera

Affiliations

Harnessing Machine Learning in Vocal Arts Medicine: A Random Forest Application for "Fach" Classification in Opera

Zehui Wang et al. Diagnostics (Basel). .

Abstract

Vocal arts medicine provides care and prevention strategies for professional voice disorders in performing artists. The issue of correct "Fach" determination depending on the presence of a lyric or dramatic voice structure is of crucial importance for opera singers, as chronic overuse often leads to vocal fold damage. To avoid phonomicrosurgery or prevent a premature career end, our aim is to offer singers an improved, objective fach counseling using digital sound analyses and machine learning procedures. For this purpose, a large database of 2004 sound samples from professional opera singers was compiled. Building on this dataset, we employed a classic ensemble learning method, namely the Random Forest algorithm, to construct an efficient fach classifier. This model was trained to learn from features embedded within the sound samples, subsequently enabling voice classification as either lyric or dramatic. As a result, the developed system can decide with an accuracy of about 80% in most examined voice types whether a sound sample has a lyric or dramatic character. To advance diagnostic tools and health in vocal arts medicine and singing voice pedagogy, further machine learning methods will be applied to find the best and most efficient classification method based on artificial intelligence approaches.

Keywords: digital sound analysis; dramatic voice structure; lyric voice structure; machine learning; opera singer; random forest; vocal arts medicine; voice classification; voice disorder prevention; voice timbre parameter.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Chart of the overall proposed workflow for fach classification in opera voices.
Figure 2
Figure 2
Labeling examples for selected sound samples in every investigated voice type (S—soprano, M—mezzo-soprano, T—tenor, B—baritone, L—bass). Voice structure: 1—dramatic, 2—lyric; gender: m—male, w—female; musical part: Isol—Isolde (from Tristan and Isolde by R. Wagner), Dora—Dorabella (from Così fan tutte by W.A. Mozart), Lohe—Lohengrin (from Lohengrin by R. Wagner), Mala—Maltesta (from Don Pasquale by G. Donizetti), Hage—Hagen (from Twilight of the Gods by R. Wagner); vowel sound symbols according to International Phonetic Alphabet: I—/i:/, U—/u:/, O—/o:/, A—/ɑ:/, E—/e:/; musical pitch according to scientific pitch notation (tuning of concert pitch A4 = 442 Hz): G4–394 Hz, G5–788 Hz, G♯4–417 Hz, C4–263 Hz, E♭3–156 Hz; index pitch/vowel: 1… n series number of the analyzed pitch/vowel in case of multiple occurrences.
Figure 3
Figure 3
Histograms of timbre parameters PHE (top) and SC (bottom), sorted by voice type (soprano to bass) as well as voice structure (lyric = blue, dramatic = green). Visually distinct plots when comparing lyric versus dramatic voice structure within a voice type can be easily differentiated (e.g., PHE baritone, SC soprano). Many histograms followed a normal distribution, with the most notable exception being lyric basses, who also suffered from the smallest sample size (n = 60).
Figure 4
Figure 4
Correlation heatmap of selected acoustic features in voice classification, showing the relationships between candidates for the input features in a RF model.
Figure 5
Figure 5
Confusion matrices of all examined voice types for the test dataset.
Figure 6
Figure 6
Importance of different features for the classification of voice structure in all investigated voice types, based on impurity.
Figure 7
Figure 7
SHAP summary plots of input features for sopranos and mezzo-sopranos (upper row), as well as tenors, baritones and basses (lower row). Features were sorted in descending order of the impact on the model output. In all plots, only the results for the dramatic voice structure are shown.

Similar articles

References

    1. Sataloff R.T. Professional singers: The science and art of clinical care. Am. J. Otolaryngol. 1981;2:251–266. doi: 10.1016/S0196-0709(81)80022-1. - DOI - PubMed
    1. Sataloff R.T. Vocal Health and Pedagogy: Science, Assessment, and Treatment. 3rd ed. Plural Publishing; San Diego, CA, USA: 2021.
    1. Am Zehnhoff-Dinnesen A., Wiskirska-Woznica B., Neumann K., Nawka T. Phoniatrics I: Fundamentals–Voice Disorders–Disorders of Language and Hearing Development (European Manual of Medicine) 1st ed. Springer; Berlin, Germany: 2020.
    1. Hammarberg B. Voice research and clinical needs. Folia Phoniatr. Logop. 2000;52:93–102. doi: 10.1159/000021517. - DOI - PubMed
    1. Dejonckere P.H., Bradley P., Clemente P., Cornut G., Crevier-Buchman L., Friedrich G., Van De Heyning P., Remacle M., Woisard V., Committee on Phoniatrics of the European Laryngological Society (ELS) A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Guideline elaborated by the Committee on Phoniatrics of the European Laryngological Society (ELS) Eur. Arch. Otorhinolaryngol. 2001;258:77–82. doi: 10.1007/s004050000299. - DOI - PubMed

Grants and funding

This research received no external funding.