Diagnostic tests must be evaluated in a clinically relevant population. However, test performance often varies across population subgroups. Spectrum bias, a term commonly used to describe this heterogeneity, is typically thought to occur when diagnostic test performance varies across patient subgroups and a study of that test's performance does not adequately represent all subgroups. Yet subgroup variation is not a bias if appropriate analyses are conducted. Failure to recognize and address heterogeneity will lead to estimates of test performance that are not generalizable to the relevant clinical populations. Heterogeneity can be addressed with relatively simple stratification procedures, limited primarily by the sample size and the precision of the estimates. This paper proposes the use of the term spectrum effect, rather than spectrum bias, and outlines strategies for using stratified sensitivity and specificity estimates, likelihood ratios, and receiver-operating characteristic curves. Investigators of diagnostic tests should consider the potential for spectrum effect seriously and should address heterogeneity in their analyses. Furthermore, clinicians should consider study samples carefully to determine whether results are generalizable to their specific patient population.