Reliability of the peer-review process for adverse event rating

PLoS One. 2012;7(7):e41239. doi: 10.1371/journal.pone.0041239. Epub 2012 Jul 26.


Background: Adverse events are poor patient outcomes caused by medical care. Their identification requires the peer-review of poor outcomes, which may be unreliable. Combining physician ratings might improve the accuracy of adverse event classification.

Objective: To evaluate the variation in peer-reviewer ratings of adverse outcomes; determine the impact of this variation on estimates of reviewer accuracy; and determine the number of reviewers who judge an adverse event occurred that is required to ensure that the true probability of an adverse event exceeded 50%, 75% or 95%.

Methods: Thirty physicians rated 319 case reports giving details of poor patient outcomes following hospital discharge. They rated whether medical management caused the outcome using a six-point ordinal scale. We conducted latent class analyses to estimate the prevalence of adverse events as well as the sensitivity and specificity of each reviewer. We used this model and Bayesian calculations to determine the probability that an adverse event truly occurred to each patient as function of their number of positive ratings.

Results: The overall median score on the 6-point ordinal scale was 3 (IQR 2,4) but the individual rater median score ranged from a minimum of 1 (in four reviewers) to a maximum median score of 5. The overall percentage of cases rated as an adverse event was 39.7% (3798/9570). The median kappa for all pair-wise combinations of the 30 reviewers was 0.26 (IQR 0.16, 0.42; Min = -0.07, Max = 0.62). Reviewer sensitivity and specificity for adverse event classification ranged from 0.06 to 0.93 and 0.50 to 0.98, respectively. The estimated prevalence of adverse events using a latent class model with a common sensitivity and specificity for all reviewers (0.64 and 0.83 respectively) was 47.6%. For patients to have a 95% chance of truly having an adverse event, at least 3 of 3 reviewers are required to deem the outcome an adverse event.

Conclusion: Adverse event classification is unreliable. To be certain that a case truly represents an adverse event, there needs to be agreement among multiple reviewers.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Outcome Assessment, Health Care / methods*
  • Peer Review / methods*
  • Physicians
  • Probability