We have recently introduced a rank-based test statistic, RankProducts (RP), as a new non-parametric method for detecting differentially expressed genes in microarray experiments. It has been shown to generate surprisingly good results with biological datasets. The basis for this performance and the limits of the method are, however, little understood. Here we explore the performance of such rank-based approaches under a variety of conditions using simulated microarray data, and compare it with classical Wilcoxon rank sums and t-statistics, which form the basis of most alternative differential gene expression detection techniques. We show that for realistic simulated microarray datasets, RP is more powerful and accurate for sorting genes by differential expression than t-statistics or Wilcoxon rank sums - in particular for replicate numbers below 10, which are most commonly used in biological experiments. Its relative performance is particularly strong when the data are contaminated by non-normal random noise or when the samples are very inhomogenous, e.g. because they come from different time points or contain a mixture of affected and unaffected cells. However, RP assumes equal measurement variance for all genes and tends to give overly optimistic p-values when this assumption is violated. It is therefore essential that proper variance stabilizing normalization is performed on the data before calculating the RP values. Where this is impossible, another rank-based variant of RP (average ranks) provides a useful alternative with very similar overall performance. The Perl scripts implementing the simulation and evaluation are available upon request. Implementations of the RP method are available for download from the authors website (http://www.brc.dcs.gla.ac.uk/glama).