Toolkit for automated and rapid discovery of structural variants

Methods. 2017 Oct 1:129:3-7. doi: 10.1016/j.ymeth.2017.05.030. Epub 2017 Jun 2.

Abstract

Structural variations (SV) are broadly defined as genomic alterations that affect >50bp of DNA, which are shown to have significant effect on evolution and disease. The advent of high throughput sequencing (HTS) technologies and the ability to perform whole genome sequencing (WGS), makes it feasible to study these variants in depth. However, discovery of all forms of SV using WGS has proven to be challenging as the short reads produced by the predominant HTS platforms (<200bp for current technologies) and the fact that most genomes include large amounts of repeats make it very difficult to unambiguously map and accurately characterize such variants. Furthermore, existing tools for SV discovery are primarily developed for only a few of the SV types, which may have conflicting sequence signatures (i.e. read pairs, read depth, split reads) with other, untargeted SV classes. Here we are introduce a new framework, Tardis, which combines multiple read signatures into a single package to characterize most SV types simultaneously, while preventing such conflicts. Tardis also has a modular structure that makes it easy to extend for the discovery of additional forms of SV.

Keywords: Combinatorial algorithms; High throughput sequencing; Structural variation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Genome, Human
  • Genomic Structural Variation / genetics*
  • Genomics*
  • High-Throughput Nucleotide Sequencing / methods*
  • High-Throughput Nucleotide Sequencing / trends
  • Humans
  • Sequence Analysis, DNA
  • Software*
  • Whole Genome Sequencing