Using Machine Learning to Evaluate Attending Feedback on Resident Performance

Sara E Neves; Michael J Chen; Cindy M Ku; Suzanne Karan; Amy N DiLorenzo; Randall M Schell; Daniel E Lee; Carol Ann B Diachun; Stephanie B Jones; John D Mitchell

doi:10.1213/ANE.0000000000005265

Using Machine Learning to Evaluate Attending Feedback on Resident Performance

Anesth Analg. 2021 Feb 1;132(2):545-555. doi: 10.1213/ANE.0000000000005265.

Authors

Affiliations

¹ From the Department of Anesthesiology, Beth Israel Deaconess Medical Center, Boston, Massachusetts.
² Department of Anesthesiology, Queen's Medical Center, Honolulu, Hawaii.
³ Department of Anesthesiology, University of Rochester Medical Center, Rochester, New York.
⁴ Department of Anesthesiology, University of Kentucky College of Medicine, Lexington, Kentucky.
⁵ Department of Anesthesiology and Pediatrics, University of California, San Diego, San Diego, California.
⁶ Department of Anesthesiology, University of Florida-Jacksonville, Jacksonville, Florida.
⁷ Department of Anesthesiology, Albany Medical College, Albany, New York.

PMID: 33323789
DOI: 10.1213/ANE.0000000000005265

Abstract

Background: High-quality and high-utility feedback allows for the development of improvement plans for trainees. The current manual assessment of the quality of this feedback is time consuming and subjective. We propose the use of machine learning to rapidly distinguish the quality of attending feedback on resident performance.

Methods: Using a preexisting databank of 1925 manually reviewed feedback comments from 4 anesthesiology residency programs, we trained machine learning models to predict whether comments contained 6 predefined feedback traits (actionable, behavior focused, detailed, negative feedback, professionalism/communication, and specific) and predict the utility score of the comment on a scale of 1-5. Comments with ≥4 feedback traits were classified as high-quality and comments with ≥4 utility scores were classified as high-utility; otherwise comments were considered low-quality or low-utility, respectively. We used RapidMiner Studio (RapidMiner, Inc, Boston, MA), a data science platform, to train, validate, and score performance of models.

Results: Models for predicting the presence of feedback traits had accuracies of 74.4%-82.2%. Predictions on utility category were 82.1% accurate, with 89.2% sensitivity, and 89.8% class precision for low-utility predictions. Predictions on quality category were 78.5% accurate, with 86.1% sensitivity, and 85.0% class precision for low-quality predictions. Fifteen to 20 hours were spent by a research assistant with no prior experience in machine learning to become familiar with software, create models, and review performance on predictions made. The program read data, applied models, and generated predictions within minutes. In contrast, a recent manual feedback scoring effort by an author took 15 hours to manually collate and score 200 comments during the course of 2 weeks.

Conclusions: Harnessing the potential of machine learning allows for rapid assessment of attending feedback on resident performance. Using predictive models to rapidly screen for low-quality and low-utility feedback can aid programs in improving feedback provision, both globally and by individual faculty.

Publication types

Multicenter Study

MeSH terms

Anesthesiologists / education*
Anesthesiology / education*
Clinical Competence*
Data Mining*
Databases, Factual
Education, Medical, Graduate*
Employee Performance Appraisal
Formative Feedback*
Humans
Internship and Residency*
Machine Learning*
Medical Staff, Hospital*
Task Performance and Analysis
United States

Grants and funding

R01 EB005807/EB/NIBIB NIH HHS/United States