Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software

Nat Commun. 2019 Jul 19;10(1):3240. doi: 10.1038/s41467-019-11146-4.

Abstract

In recent years, many software packages for identifying structural variants (SVs) using whole-genome sequencing data have been released. When published, a new method is commonly compared with those already available, but this tends to be selective and incomplete. The lack of comprehensive benchmarking of methods presents challenges for users in selecting methods and for developers in understanding algorithm behaviours and limitations. Here we report the comprehensive evaluation of 10 SV callers, selected following a rigorous process and spanning the breadth of detection approaches, using high-quality reference cell lines, as well as simulations. Due to the nature of available truth sets, our focus is on general-purpose rather than somatic callers. We characterise the impact on performance of event size and type, sequencing characteristics, and genomic context, and analyse the efficacy of ensemble calling and calibration of variant quality scores. Finally, we provide recommendations for both users and methods developers.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cell Line
  • Computational Biology / methods*
  • Diploidy
  • Genome, Human / genetics*
  • Genomic Structural Variation / genetics*
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Reproducibility of Results
  • Software*
  • Whole Genome Sequencing / methods*