Measuring the quality of health care has become a major concern for funders and providers of health services in recent decades. One of the ways in which quality of care is currently assessed is by taking routinely collected data and analysing them quantitatively. The use of routine data has many advantages but there are also some important pitfalls. Collating numerical data in this way means that comparisons can be made--whether over time, with benchmarks, or with other healthcare providers (at individual or institutional levels of aggregation). Inevitably, such comparisons reveal variations. The natural inclination is then to assume that such variations imply rankings: that the measures reflect quality and that variations in the measures reflect variations in quality. This paper identifies reasons why these assumptions need to be applied with care, and illustrates the pitfalls with examples from recent empirical work. It is intended to guide not only those who wish to interpret comparative quality data, but also those who wish to develop systems for such analyses themselves.