Gene expression signatures identify important genes that predict a specified outcome. In several notable diseases such as leukemia and breast cancer, the results have been encouraging. In these datasets, many techniques work well when discriminating particular outcomes. However, these same methods, applied to other datasets, are unable to achieve similar levels of success. Given the small sample sizes common to these studies and the large dimensionality of the data, several key issues exist when attempting to construct reliable, reproducible gene signatures. The classifiers may not be sufficient to discriminate classes, or the data itself may not be sufficient to produce effective separation. In this paper, three simple measures of classification complexity are considered to explore a limit to the predictive accuracy that can be achieved in a dataset. Two independent gene expression datasets (lung and colorectal cancer) are considered, using three different outcomes on each dataset. Four different classifiers, using the t-test for feature selection, were tested on these datasets as a representative panel of classifiers. Our results indicate that Fisher's discriminant ratio provides a good measure of the complexity of the classification problem, with a high correlation between complexity and best classification accuracy across all problems (R(2)=0.78). Specifically, predicting gender is a low complexity problem as indicated both by the complexity measure and the classification results. More clinically-oriented endpoints are more complex and have lower classification accuracies. Therefore, classification complexity can be used to estimate maximum attainable accuracy for a problem reducing the need to evaluate many different classifiers.