Interoperable RNA-Seq analysis in the cloud

Biochim Biophys Acta Gene Regul Mech. 2020 Jun;1863(6):194521. doi: 10.1016/j.bbagrm.2020.194521. Epub 2020 Mar 7.

Abstract

RNA-Sequencing (RNA-Seq) is currently the leading technology for genome-wide transcript quantification. Mapping the raw reads to transcript and gene level counts can be achieved by different aligners. Here we report an in-depth comparison of transcript quantification methods. Our goal is the specific use of cost-efficient RNA-Seq analysis for deployment in a cloud infrastructure composed of interacting microservices. The individual modules cover file transfer into the cloud and APIs to handle the cloud alignment jobs. We next demonstrate how newly generated RNA-Seq data can be placed in the context of thousands of previously published datasets in near real time. With in-depth benchmarks, we identify suitable gene count quantification methods to facilitate cost-effective, accurate, and cloud-based RNA-Seq analysis service. Pseudo-alignment algorithms such as kallisto and Salmon combine high read quality estimation with cost efficient runtime performance. HISAT2 is the fastest of the classical aligners with good alignment quality. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Animals
  • Benchmarking
  • Cloud Computing*
  • Humans
  • Mice
  • Real-Time Polymerase Chain Reaction
  • Sequence Alignment
  • Sequence Analysis, RNA*