Pincho: A Modular Approach to High Quality De Novo Transcriptomics

Genes (Basel). 2021 Jun 22;12(7):953. doi: 10.3390/genes12070953.

Abstract

Transcriptomic reconstructions without reference (i.e., de novo) are common for data samples derived from non-model biological systems. These assemblies involve massive parallel short read sequence reconstructions from experiments, but they usually employ ad-hoc bioinformatic workflows that exhibit limited standardization and customization. The increasing number of transcriptome assembly software continues to provide little room for standardization which is exacerbated by the lack of studies on modularity that compare the effects of assembler synergy. We developed a customizable management workflow for de novo transcriptomics that includes modular units for short read cleaning, assembly, validation, annotation, and expression analysis by connecting twenty-five individual bioinformatic tools. With our software tool, we were able to compare the assessment scores based on 129 distinct single-, bi- and tri-assembler combinations with diverse k-mer size selections. Our results demonstrate a drastic increase in the quality of transcriptome assemblies with bi- and tri- assembler combinations. We aim for our software to improve de novo transcriptome reconstructions for the ever-growing landscape of RNA-seq data derived from non-model systems. We offer guidance to ensure the most complete transcriptomic reconstructions via the inclusion of modular multi-assembly software controlled from a single master console.

Keywords: NGS; RNA-sequencing; automation; software management; transcriptome assembly.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology
  • Gene Expression Profiling / methods*
  • RNA-Seq / methods
  • Sequence Analysis, RNA / methods*
  • Software*
  • Transcriptome / genetics*