Background: MiR arrays distinguish themselves from gene expression arrays by their more limited number of probes, and the shorter and less flexible sequence in probe design. Robust data processing and analysis methods tailored to the unique characteristics of miR arrays are greatly needed. Assumptions underlying commonly used normalization methods for gene expression microarrays containing tens of thousands or more probes may not hold for miR microarrays. Findings from previous studies have sometimes been inconclusive or contradictory. Further studies to determine optimal normalization methods for miR microarrays are needed.
Methods: We evaluated many different normalization methods for data generated with a custom-made two channel miR microarray using two data sets that have technical replicates from several different cell lines. The impact of each normalization method was examined on both within miR error variance (between replicate arrays) and between miR variance to determine which normalization methods minimized differences between replicate samples while preserving differences between biologically distinct miRs.
Results: Lowess normalization generally did not perform as well as the other methods, and quantile normalization based on an invariant set showed the best performance in many cases unless restricted to a very small invariant set. Global median and global mean methods performed reasonably well in both data sets and have the advantage of computational simplicity.
Conclusions: Researchers need to consider carefully which assumptions underlying the different normalization methods appear most reasonable for their experimental setting and possibly consider more than one normalization approach to determine the sensitivity of their results to normalization method used.