Benchmarking physician performance: reliability of individual and composite measures

Am J Manag Care. 2008 Dec;14(12):833-8.


Objective: To examine the reliability of quality measures to assess physician performance, which are increasingly used as the basis for quality improvement efforts, contracting decisions, and financial incentives, despite concerns about the methodological challenges.

Study design: Evaluation of health plan administrative claims and enrollment data.

Methods: The study used administrative data from 9 health plans representing more than 11 million patients. The number of quality events (patients eligible for a quality measure), mean performance, and reliability estimates were calculated for 27 quality measures. Composite scores for preventive, chronic, acute, and overall care were calculated as the weighted mean of the standardized scores. Reliability was estimated by calculating the physician-to-physician variance divided by the sum of the physician-to-physician variance plus the measurement variance, and 0.70 was considered adequate.

Results: Ten quality measures had reliability estimates above 0.70 at a minimum of 50 quality events. For other quality measures, reliability was low even when physicians had 50 quality events. The largest proportion of physicians who could be reliably evaluated on a single quality measure was 8% for colorectal cancer screening and 2% for nephropathy screening among patients with diabetes mellitus. More physicians could be reliably evaluated using composite scores (<17% for preventive care, >7% for chronic care, and 15%-20% for an overall composite).

Conclusions: In typical health plan administrative data, most physicians do not have adequate numbers of quality events to support reliable quality measurement. The reliability of quality measures should be taken into account when quality information is used for public reporting and accountability. Efforts to improve data available for physician profiling are also needed.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms
  • Benchmarking / methods*
  • Drug Utilization Review
  • Health Care Surveys
  • Humans
  • Information Dissemination
  • Managed Care Programs / standards*
  • Medical Audit / methods*
  • Physicians / classification
  • Physicians / standards*
  • Primary Prevention / standards
  • Quality Indicators, Health Care / classification*
  • Reproducibility of Results
  • Social Responsibility
  • Total Quality Management / methods*
  • United States