Motivation: In this article, we show that the classification of human precursor microRNA (pre-miRNAs) hairpins from both genome pseudo hairpins and other non-coding RNAs (ncRNAs) is a common and essential requirement for both comparative and non-comparative computational recognition of human miRNA genes. However, the existing computational methods do not address this issue completely or successfully. Here we present the development of an effective classifier system (named as microPred) for this classification problem by using appropriate machine learning techniques. Our approach includes the introduction of more representative datasets, extraction of new biologically relevant features, feature selection, handling of class imbalance problem in the datasets and extensive classifier performance evaluation via systematic cross-validation methods.
Results: Our microPred classifier yielded higher and, especially, much more reliable classification results in terms of both sensitivity (90.02%) and specificity (97.28%) than the exiting pre-miRNA classification methods. When validated with 6095 non-human animal pre-miRNAs and 139 virus pre-miRNAs from miRBase, microPred resulted in 92.71% (5651/6095) and 94.24% (131/139) recognition rates, respectively.