Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software

Daniel L Cameron; Leon Di Stefano; Anthony T Papenfuss

doi:10.1038/s41467-019-11146-4

Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software

Nat Commun. 2019 Jul 19;10(1):3240. doi: 10.1038/s41467-019-11146-4.

Authors

Daniel L Cameron^{1

2}, Leon Di Stefano¹, Anthony T Papenfuss^{3

4

5

6

7}

Affiliations

¹ Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Pde, Parkville, VIC, 3052, Australia.
² Department of Medical Biology, University of Melbourne, Parkville, VIC, 3010, Australia.
³ Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Pde, Parkville, VIC, 3052, Australia. papenfuss@wehi.edu.au.
⁴ Department of Medical Biology, University of Melbourne, Parkville, VIC, 3010, Australia. papenfuss@wehi.edu.au.
⁵ Peter MacCallum Cancer Centre, Victorian Comprehensive Cancer Centre, Melbourne, VIC, 3000, Australia. papenfuss@wehi.edu.au.
⁶ Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, VIC, 3010, Australia. papenfuss@wehi.edu.au.
⁷ School of Mathematics and Statistics, University of Melbourne, Parkville, VIC, 3010, Australia. papenfuss@wehi.edu.au.

Abstract

In recent years, many software packages for identifying structural variants (SVs) using whole-genome sequencing data have been released. When published, a new method is commonly compared with those already available, but this tends to be selective and incomplete. The lack of comprehensive benchmarking of methods presents challenges for users in selecting methods and for developers in understanding algorithm behaviours and limitations. Here we report the comprehensive evaluation of 10 SV callers, selected following a rigorous process and spanning the breadth of detection approaches, using high-quality reference cell lines, as well as simulations. Due to the nature of available truth sets, our focus is on general-purpose rather than somatic callers. We characterise the impact on performance of event size and type, sequencing characteristics, and genomic context, and analyse the efficacy of ensemble calling and calibration of variant quality scores. Finally, we provide recommendations for both users and methods developers.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cell Line
Computational Biology / methods*
Diploidy
Genome, Human / genetics*
Genomic Structural Variation / genetics*
Genomics / methods
High-Throughput Nucleotide Sequencing / methods*
Humans
Reproducibility of Results
Software*
Whole Genome Sequencing / methods*