Background: To establish credible, defensible and acceptable passing scores for written tests is a challenge for health profession educators. Angoff procedures are often used to establish pass/fail decisions for written and performance tests. In an Angoff procedure judges' expertise and professional skills are assumed to influence their ratings of the items during standard-setting. The purpose of this study was to investigate the impact of judges' item-related knowledge on their judgement of the difficulty of items, and second, to determine the stability of differences between judges.
Method: Thirteen judges were presented with two sets of 60 items on different occasions. They were asked to not only judge the difficulty of the items but also to answer them, without the benefit of the answer key. For each of the 120 items an Angoff estimate and an item score were obtained. The relationship between the Angoff estimate and the item score was examined by applying a regression analysis to the 60 items (Angoff estimate, score) for each judge at each occasion.
Results and conclusions: This study shows that in standard-setting the individual judgement of the individual item is not only a reflection of the difficulty of the item but also of the inherent stringency of the judge and his/her subject-related knowledge. Considerable variation between judges in their stringency was found, and Angoff estimates were significantly affected by a judge knowing or not knowing the answer to the item. These findings stress the importance of a careful selection process of the Angoff judges when making pass/fail decisions in health professions education. They imply that judges should be selected who are not only capable of conceptualising the 'minimally competent student', but who would also be capable of answering all the items.