Variation in the accuracy of examiner judgements is a source of measurement error in performance-based tests. In previous studies using doctor subjects, examiner training yielded marginal or no improvement in the accuracy of examiner judgments. This study reports an experiment on accuracy of scoring in which provision of training and background of examiners are systematically varied. Experienced teaching staff, medical students and lay subjects were randomly assigned to either training or no-training groups. Using detailed behavioural check-lists, they subsequently scored videotaped performance on two clinical cases, and accuracy of their judgments was appraised. Results indicated that the need for and effectiveness of training varied across groups: it was least needed and least effective for the teaching staff group, more needed and effective for medical students, and most needed and effective for the lay group. The accuracy of the lay group after training approached the accuracy of untrained teaching staff. Trained medical students were as accurate as trained teaching staff. For teaching staff and medical students training also influenced the nature of errors made by reducing the number of errors of commission. It was concluded that training varies in effectiveness as a function of medical experience and that trained lay persons can be utilized as examiners in performance-based tests.