RNAIndel: discovering somatic coding indels from tumor RNA-Seq data

Bioinformatics. 2020 Mar 1;36(5):1382-1390. doi: 10.1093/bioinformatics/btz753.

Abstract

Motivation: Reliable identification of expressed somatic insertions/deletions (indels) is an unmet need due to artifacts generated in PCR-based RNA-Seq library preparation and the lack of normal RNA-Seq data, presenting analytical challenges for discovery of somatic indels in tumor transcriptome.

Results: We present RNAIndel, a tool for predicting somatic, germline and artifact indels from tumor RNA-Seq data. RNAIndel leverages features derived from indel sequence context and biological effect in a machine-learning framework. Except for tumor samples with microsatellite instability, RNAIndel robustly predicts 88-100% of somatic indels in five diverse test datasets of pediatric and adult cancers, even recovering subclonal (VAF range 0.01-0.15) driver indels missed by targeted deep-sequencing, outperforming the current best-practice for RNA-Seq variant calling which had 57% sensitivity but with 14 times more false positives.

Availability and implementation: RNAIndel is freely available at https://github.com/stjude/RNAIndel.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Child
  • Exome Sequencing
  • High-Throughput Nucleotide Sequencing
  • Humans
  • INDEL Mutation
  • Neoplasms / genetics*
  • RNA-Seq*
  • Software