Computer-aided diagnostic (CAD) schemes have been developed to assist radiologists in detecting various lesions in medical images. In addition to the development, an equally important problem is the reliable evaluation of the performance levels of various CAD schemes. It is good to see that more and more investigators are employing more reliable evaluation methods such as leave-one-out and cross validation, instead of less reliable methods such as resubstitution, for assessing their CAD schemes. However, the common applications of leave-one-out and cross-validation evaluation methods do not necessarily imply that the estimated performance levels are accurate and precise. Pitfalls often occur in the use of leave-one-out and cross-validation evaluation methods, and they lead to unreliable estimation of performance levels. In this study, we first identified a number of typical pitfalls for the evaluation of CAD schemes, and conducted a Monte Carlo simulation experiment for each of the pitfalls to demonstrate quantitatively the extent of bias and/or variance caused by the pitfall. Our experimental results indicate that considerable bias and variance may exist in the estimated performance levels of CAD schemes if one employs various flawed leave-one-out and cross-validation evaluation methods. In addition, for promoting and utilizing a high standard for reliable evaluation of CAD schemes, we attempt to make recommendations, whenever possible, for overcoming these pitfalls. We believe that, with the recommended evaluation methods, we can considerably reduce the bias and variance in the estimated performance levels of CAD schemes.