Determining reliability of clinical assessment scores in real time

Teach Learn Med. 2009 Jul;21(3):188-94. doi: 10.1080/10401330903014137.


Background: Assessment score reliability is usually based on a single analysis. However, reliability is an essential component of validity and assessment validation and revision is a never-ending cycle. For ongoing assessments over extended time frames, real-time reliability computations may alert users to possible changes in the learning environment that are revealed by variations in reliability over time.

Purpose: To develop software that calculates the reliability of clinical assessments in real time.

Methods: Over 2,400 assessment forms were analyzed. We developed software that calculates reliability in real time. Software accuracy was verified by comparing data from our software with a standard method. Factor analysis determined scale dimensionality.

Results: Correlation between our software and a standard method was excellent (ICC for kappas = 0.97; Cronbach's alphas differed by < 0.03). Cronbach's alpha ranged from 0.94 to 0.97 and weighted kappa ranged from 0.08 to 0.40. Factor analysis confirmed 3 teaching domains.

Conclusions: We describe an accurate method for calculating reliability in real time. The benefit of real time computation is that it provides a mechanism for detecting possible changes (related to curriculum, teachers, and students) in the learning environment indicated by changes in reliability over time. This technique will enable investigators to monitor and detect changes in the reliability of assessment scores and, with future study, isolate aspects of the learning environment that impact on reliability.

MeSH terms

  • Adult
  • Clinical Competence*
  • Education, Medical, Graduate / standards*
  • Educational Measurement / standards*
  • Factor Analysis, Statistical
  • Female
  • Humans
  • Internal Medicine / education
  • Internal Medicine / standards*
  • Internship and Residency / standards*
  • Male
  • Reproducibility of Results
  • Software*