Rating scales and Rasch measurement

Expert Rev Pharmacoecon Outcomes Res. 2011 Oct;11(5):571-85. doi: 10.1586/erp.11.59.


Assessments with ratings in ordered categories have become ubiquitous in health, biological and social sciences. Ratings are used when a measuring instrument of the kind found in the natural sciences is not available to assess some property in terms of degree - for example, greater or smaller, better or worse, or stronger or weaker. The handling of ratings has ranged from the very elementary to the highly sophisticated. In an elementary form, and assumed in classical test theory, the ratings are scored with successive integers and treated as measurements; in a sophisticated form, and used in modern test theory, the ratings are characterized by probabilistic response models with parameters for persons and the rating categories. Within modern test theory, two paradigms, similar in many details but incompatible on crucial points, have emerged. For the purposes of this article, these are termed the statistical modeling and experimental measurement paradigms. Rather than reviewing a compendium of available methods and models for analyzing ratings in detail, the article focuses on the incompatible differences between these two paradigms, with implications for choice of model and inferences. It shows that the differences have implications for different roles for substantive researchers and psychometricians in designing instruments with rating scales. To illustrate these differences, an example is provided.

Publication types

  • Review

MeSH terms

  • Humans
  • Models, Statistical*
  • Muscle Tonus
  • Probability
  • Psychometrics*
  • Research Design
  • Weights and Measures