Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review

J Clin Epidemiol. 1988;41(9):923-37. doi: 10.1016/0895-4356(88)90110-2.


We review methods for the analysis of categorical clinical and epidemiological data, in which the observations are subject to misclassification. Under certain conditions, it is possible to estimate error parameters such as sensitivity, specificity, relative risk, or predictive value, even though no definitive classification (gold standard) is available. The parameter estimates are obtained by modelling the data, using maximum likelihood, with or without some constraints. The models recognize that the true classification of an individual is unknown, and so are sometimes referred to as "latent class" models. The latent class approach provides a unified framework for various methods found in a dispersed literature, characterising each by the number of populations or subgroups in the data, and the number of observations made on each individual; the statistical degrees of freedom are implied by the sampling design. Data sets with less than three replicate observations per individual necessarily require constraints for parameter estimation to be possible. Data sets with three or more replicates lead directly to estimates of the misclassification rates, subject to some simple assumptions. Some more complex problems are also discussed, including data where the response variable has more than two levels, sequential and irregular designs and the effects of assumption violations.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Classification
  • Data Collection
  • Data Interpretation, Statistical
  • Epidemiologic Methods*
  • Sensitivity and Specificity
  • Statistics as Topic*