Background: A common feature of diagnostic research is that results for a diagnostic gold standard are available primarily for patients who are positive for the test under investigation. Data from such studies are subject to what has been termed "verification bias". We evaluated statistical methods for verification bias correction when there are few false negatives.
Methods: A simulation study was conducted of a screening study subject to verification bias. We compared estimates of the area-under-the-curve (AUC) corrected for verification bias varying both the rate and mechanism of verification.
Results: In a single simulated data set, varying false negatives from 0 to 4 led to verification bias corrected AUCs ranging from 0.550 to 0.852. Excess variation associated with low numbers of false negatives was confirmed in simulation studies and by analyses of published studies that incorporated verification bias correction. The 2.5th - 97.5th centile range constituted as much as 60% of the possible range of AUCs for some simulations.
Conclusion: Screening programs are designed such that there are few false negatives. Standard statistical methods for verification bias correction are inadequate in this circumstance.