ABEILLE: a novel method for ABerrant Expression Identification empLoying machine LEarning from RNA-sequencing data

Bioinformatics. 2022 Oct 14;38(20):4754-4761. doi: 10.1093/bioinformatics/btac603.

Abstract

Motivation: Current advances in omics technologies are paving the diagnosis of rare diseases proposing a complementary assay to identify the responsible gene. The use of transcriptomic data to identify aberrant gene expression (AGE) has demonstrated to yield potential pathogenic events. However, popular approaches for AGE identification are limited by the use of statistical tests that imply the choice of arbitrary cut-off for significance assessment and the availability of several replicates not always possible in clinical contexts.

Results: Hence, we developed ABerrant Expression Identification empLoying machine LEarning from sequencing data (ABEILLE) a variational autoencoder (VAE)-based method for the identification of AGEs from the analysis of RNA-seq data without the need for replicates or a control group. ABEILLE combines the use of a VAE, able to model any data without specific assumptions on their distribution, and a decision tree to classify genes as AGE or non-AGE. An anomaly score is associated with each gene in order to stratify AGE by the severity of aberration. We tested ABEILLE on a semi-synthetic and an experimental dataset demonstrating the importance of the flexibility of the VAE configuration to identify potential pathogenic candidates.

Availability and implementation: ABEILLE source code is freely available at: https://github.com/UCA-MSI/ABEILLE.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Exome Sequencing
  • Machine Learning*
  • RNA* / genetics
  • Sequence Analysis, RNA / methods
  • Software

Substances

  • RNA