Objective: I provide researchers with tables of sample size for multiobserver receiver operating characteristic (ROC) studies that compare the diagnostic accuracies of two imaging techniques.
Materials and methods: I computed the number of patients and observers needed as a function of five parameters: the measure of diagnostic accuracy (area under the ROC curve, sensitivity at a false-positive rate </= 0.10, or specificity at a false-negative rate </= 0.10), conjectured level of accuracy, suspected difference in accuracy between the two imaging techniques, observer variability, and ratio of patients without to patients with the condition.
Results: The numbers of patients and observers required vary dramatically with these five parameters, increasing with more refined measures of accuracy, with lower accuracy levels, with smaller suspected differences, with greater observer variability, and with less balanced designs. The number of patients required for a study can be reduced by increasing the number of observers, and vice versa. When the intra- and interobserver variability is large, a study design with just four observers is usually inadequate.
Conclusion: Many factors must be considered when determining the appropriate sample sizes for multiobserver ROC studies. My tables serve only as initial ballpark estimates. Investigators should compute sample size using parameters that reflect their clinical application.