A central principle of programmatic assessment is that the final decision is not a surprise to the learner. To achieve this, assessments must demonstrate predictive and consequential validity, however, to date, research has only focussed on the former. The present study attempts to address this gap by examining the predictive and consequential validity of flagging systems used by Australian General Practice regional training organisations (RTOs) in relation to Fellowship examinations. Informed by unstructured interviews with Senior Medical Educators to understand the flagging system of each RTO, meta-analyses of routinely-collected flagging data were used to examine the predictive validity of flagging at various points in training and exam performance. Additionally, flagging system features identified from the interviews were used to inform exploratory subgroup analyses and meta-regressions to further assess the predictive and consequential validity of these systems. Registrars flagged near the end of their training were two to four times more likely to fail Fellowship exams than their non-flagged counterparts. Regarding flagging system features, having graded (i.e. ordinal) flagging systems was associated with higher accuracy, whilst involving the assigned medical educator in remediation and initiating a formal diagnostic procedure following a flag improved registrars' chances of passing exams. These results demonstrate both predictive and consequential validity of flagging systems. We argue that flagging is most effective when initiated early in training in conjunction with mechanisms to maximise diagnostic accuracy and the quality of remediation programs.
Keywords: Flagging; General Practice training; Meta-analysis; Postgraduate medical education; Remediation.