Connecting high-dimensional mRNA and miRNA expression data for binary medical classification problems

Comput Methods Programs Biomed. 2013 Sep;111(3):592-601. doi: 10.1016/j.cmpb.2013.05.013. Epub 2013 Jul 10.


In modern molecular biology, high-throughput experiments allow the simultaneous study of expression levels of thousands of biopolymers such as mRNAs, miRNAs or proteins. A typical goal of such experiments is to find molecular signatures that can distinguish between different types of tissue or that can predict a therapy outcome. While research typically focuses on just one type of molecular features of a gene, e.g. mRNA expression levels, there is increasing interest in the study of several types of features in parallel, i.e. within the same biological samples. In this manuscript, we aim at elucidating the peculiarities of the combination of mRNA and miRNA expression levels in binary medical classification problems by proposing and comparing different methodologies. The ensuing combined classifiers are evaluated within a simulation study. They are based on linear discriminant analysis, linear support vector machines, as well as on a non-linear classifier. In addition, we compare the performance of the different approaches on real expression data sets. In the simulations as well as in the real data sets, in most though not all cases the combinations yield equal or higher accuracy than the individual classifiers based on only one type of features.

Keywords: Classifier combination; Discriminant analysis; High-dimensional data; MicroRNA; Non-linear classification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Discriminant Analysis
  • Gene Expression Profiling
  • Humans
  • MicroRNAs / genetics*
  • RNA, Messenger / genetics*
  • Support Vector Machine


  • MicroRNAs
  • RNA, Messenger