Subjective rating scales are widely used in almost every aspect of ergonomics research and practice for the assessment of workload, fatigue, usability, annoyance and comfort, and lesser known qualities such as urgency and presence, but are they truly scientific? This paper raises some of the key issues as a basis for debate. First, it is argued that all empirical observations, including those conventionally labelled as 'objective', are unavoidably subjective. Shared meaning between observers, or intersubjectivity, is the key criterion of scientific probity. The practical steps that can be taken to increase intersubjective agreement are discussed and the well-known sources of error and bias in human judgement reviewed. The role of conscious experience as a mechanism for appraising the environment and guiding behaviour has important implications for the interpretation of subjective reports. The view that psychometric measures do not conform to the requirements of truly 'scientific' measurement is discussed. Human judgement of subjective attributes is essentially ordinal and, unlike physical measures, can be matched to interval scales only with difficulty, but ordinal measures can be used successfully both to develop and test substantive theories using multivariate statistical techniques. Constructs such as fatigue are best understood as latent or inferred variables defined by a set of manifest or directly observed indicator variables. Both construct validity and predictive validity are viewed from this perspective and this helps to clarify several problems including the dissociation between measures of different aspects of a given construct, the question of whether physical (e.g. physiological) measures should be preferred to subjective measures and whether a single measure of constructs which are essentially multidimensional having both subjective and physical components is desirable. Finally, the fitness of subjective ratings to different purposes within the broad field of ergonomics research is discussed. For testing of competing hypotheses concerning the mechanisms underlying human performance, precise quantitative predictions are rarely needed. The same is frequently true of comparative evaluation of competing designs. In setting design standards, however, something approaching the level of measurement required for precise quantitative prediction is required, but this is difficult to achieve in practice. Although it may be possible to establish standards within restricted contexts, general standards for broadly conceived constructs such as workload are impractical owing to the requirement for representative sampling of tasks, work environments and personnel.