Significant improvement of miRNA target prediction accuracy in large datasets using meta-strategy based on comprehensive voting and artificial neural networks

BMC Genomics. 2019 Feb 27;20(1):158. doi: 10.1186/s12864-019-5528-1.

Abstract

Background: Identifying mRNA targets of miRNAs is critical for studying gene expression regulation at the whole-genome level. Multiple computational tools have been developed to predict miRNA:mRNA interactions. Nonetheless, many of these tools are developed in various small datasets, which each represent a limited sample space. Thus, the prediction accuracy of these tools has not been systematically validated at a larger scale. Accordingly, comparing the prediction accuracy of these tools and determining their applicability become challenging. In addition, the accuracy of these tools, especially in large datasets, needs to be improved for broader applications.

Results: In this project, a large dataset containing more than 46,600 miRNA:mRNA interactions was assembled and split into eleven subsets based on the availability of prediction scores of four individual predictors, which are miRanda, miRDB, PITA, and TargetScan. In each of these subsets, the predictive results of four individual predictors were integrated using decision-tree based artificial neural networks to make the meta-prediction. The decision-tree is used here to sort the predictive results of four individual predictors, and artificial neural networks are applied to make meta-prediction based on the outputs of individual predictors. In the decision tree, dual-threshold and two-step significance-voting were incorporated, information gain was analysed to select threshold values. The prediction performance of this new strategy was improved significantly in most of the eleven datasets comparing to the individual predictors and other meta-predictors, such as ComiR, under multi-fold cross-validation, as well as in independent datasets. The overall improvement of prediction accuracy in independent datasets is at least 9 percentile points comparing to the other predictors, and the percentage of improvement of F1 and MCC scores is at least 40% compared to the other predictors.

Conclusions: The combination of dual-threshold, two-step significance-voting, and analysis of information gain is very effective in optimizing the outcome of decision-tree, and further integration with artificial neural networks is critical for further improving the performance of meta-predictor. A new pipeline based on this integration for miRNA target prediction has been developed. A strategy using outputs of individual predictors to reorganize large-scale miRNA:mRNA interaction dataset has also been validated and used to evaluate the prediction accuracy of predictors. The predictor is available at: https://github.com/xueLab/mirTarDANN ).

Keywords: Artificial neural network; Dual-threshold; Meta-strategy; Two-step significance voting; miRNA target.

Publication types

  • Evaluation Study

MeSH terms

  • Datasets as Topic
  • Gene Expression Regulation
  • MicroRNAs / metabolism*
  • Neural Networks, Computer*
  • RNA, Messenger / metabolism

Substances

  • MicroRNAs
  • RNA, Messenger