Background: High-throughput bioinformatics analyses of next generation sequencing (NGS) data often require challenging pipeline optimization. The key problem is choosing appropriate tools and selecting the best parameters for optimal precision and recall.
Results: Here we introduce ToTem, a tool for automated pipeline optimization. ToTem is a stand-alone web application with a comprehensive graphical user interface (GUI). ToTem is written in Java and PHP with an underlying connection to a MySQL database. Its primary role is to automatically generate, execute and benchmark different variant calling pipeline settings. Our tool allows an analysis to be started from any level of the process and with the possibility of plugging almost any tool or code. To prevent an over-fitting of pipeline parameters, ToTem ensures the reproducibility of these by using cross validation techniques that penalize the final precision, recall and F-measure. The results are interpreted as interactive graphs and tables allowing an optimal pipeline to be selected, based on the user's priorities. Using ToTem, we were able to optimize somatic variant calling from ultra-deep targeted gene sequencing (TGS) data and germline variant detection in whole genome sequencing (WGS) data.
Conclusions: ToTem is a tool for automated pipeline optimization which is freely available as a web application at https://totem.software .
Keywords: Benchmarking; Next generation sequencing; Parameter optimization; Variant calling.
Conflict of interest statement
Ethics approval and consent to participate
The whole study and written informed consent obtained from all patients analysed for variant discovery in the
For GIAB data, ethics approval is not required as the human data were publicly available on the GIAB website.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database.BMC Bioinformatics. 2018 Dec 12;19(1):477. doi: 10.1186/s12859-018-2532-4. BMC Bioinformatics. 2018. PMID: 30541431 Free PMC article.
CoVaCS: a consensus variant calling system.BMC Genomics. 2018 Feb 5;19(1):120. doi: 10.1186/s12864-018-4508-1. BMC Genomics. 2018. PMID: 29402227 Free PMC article.
SNVerGUI: a desktop tool for variant analysis of next-generation sequencing data.J Med Genet. 2012 Dec;49(12):753-5. doi: 10.1136/jmedgenet-2012-101001. Epub 2012 Sep 28. J Med Genet. 2012. PMID: 23024288
A beginners guide to SNP calling from high-throughput DNA-sequencing data.Hum Genet. 2012 Oct;131(10):1541-54. doi: 10.1007/s00439-012-1213-z. Epub 2012 Aug 11. Hum Genet. 2012. PMID: 22886560 Review.
Review of alignment and SNP calling algorithms for next-generation sequencing data.J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9. J Appl Genet. 2016. PMID: 26055432 Review.