Introduction: An earlier study showed that an Angoff procedure with > or = 10 recently graduated students as judges can be used to estimate the passing score of a progress test. As the acceptability and feasibility of this approach are questionable, we conducted an Angoff procedure with test item writers as judges. This paper reports on the reliability and credibility of this procedure and compares the standards set by the two different panels.
Methods: Fourteen item writers judged 146 test items. Recently graduated students had assessed these items in a previous study. Generalizability was investigated as a function of the number of items and judges. Credibility was judged by comparing the pass/fail rates associated with the Angoff standard, a relative standard and a fixed standard. The Angoff standards obtained by item writers and graduates were compared.
Results: The variance associated with consistent variability of item writers across items was 1.5% and for graduate students it was 0.4%. An acceptable error score required 39 judges. Item-Angoff estimates of the two panels and item P-values correlated highly. Failure rates of 57%, 55% and 7% were associated with the item writers' standard, the fixed standard and the graduates' standard, respectively.
Conclusion: The graduates' and the item writers' standards differed substantially, as did the associated failure rates. A panel of 39 item writers is not feasible. The item writers' passing score appears to be less credible. The credibility of the graduates' standard needs further evaluation. The acceptability and feasibility of a panel consisting of both students and item writers may be worth investigating.