Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 28 (18), i333-i339

DELLY: Structural Variant Discovery by Integrated Paired-End and Split-Read Analysis

Affiliations

DELLY: Structural Variant Discovery by Integrated Paired-End and Split-Read Analysis

Tobias Rausch et al. Bioinformatics.

Abstract

Motivation: The discovery of genomic structural variants (SVs) at high sensitivity and specificity is an essential requirement for characterizing naturally occurring variation and for understanding pathological somatic rearrangements in personal genome sequencing data. Of particular interest are integrated methods that accurately identify simple and complex rearrangements in heterogeneous sequencing datasets at single-nucleotide resolution, as an optimal basis for investigating the formation mechanisms and functional consequences of SVs.

Results: We have developed an SV discovery method, called DELLY, that integrates short insert paired-ends, long-range mate-pairs and split-read alignments to accurately delineate genomic rearrangements at single-nucleotide resolution. DELLY is suitable for detecting copy-number variable deletion and tandem duplication events as well as balanced rearrangements such as inversions or reciprocal translocations. DELLY, thus, enables to ascertain the full spectrum of genomic rearrangements, including complex events. On simulated data, DELLY compares favorably to other SV prediction methods across a wide range of sequencing parameters. On real data, DELLY reliably uncovers SVs from the 1000 Genomes Project and cancer genomes, and validation experiments of randomly selected deletion loci show a high specificity.

Availability: DELLY is available at www.korbel.embl.de/software.html

Contact: tobias.rausch@embl.de.

Figures

Fig. 1.
Fig. 1.
DELLY design: short-range and long-range paired-end libraries are analyzed for discordantly mapped read pairs. Paired-end predicted structural variants are then refined using split-reads and reported at single-nucleotide breakpoint resolution
Fig. 2.
Fig. 2.
Paired-end clustering and split-read detection for a deletion (A), inversion (B), tandem duplication (C) and translocation (D)
Fig. 3.
Fig. 3.
Graph-based paired-end clustering: a graph G of structural rearrangements with two connected components C1 and C2 and two maximal cliques (pi,pj,pk) and (pm,pn). The non-clique edges are in gray. For simplicity, edge weights have been omitted
Fig. 4.
Fig. 4.
The build-up of the split-read alignment reference depends on the type of paired-end call. For tandem duplications, inversions and translocations, we modify the reference in such a way that a standard ‘deletion-type’ split-read alignment can be carried out
Fig. 5.
Fig. 5.
Using an index of the SV reference, DELLY records the number of seven-mer hits per diagonal for each read. In the above example, Read1 induces three hits on diagonal 10 and 12 on diagonal 75. Read2 induces seven hits on diagonal 7 and 16 hits on diagonal 72. Read3 induces 11 hits on diagonal 3 and 12 hits on diagonal 68. For all the reads the offset between the two most supported diagonals is 65 bp suggesting an SV length of 65 bp. The consensus sequence of the three reads is shown at the top
Fig. 6.
Fig. 6.
(A) PCR results for 44 randomly selected split-read deletion calls in 5 samples. (B) A polymorphic deletion site (chr5:60001704-60003666) that is homozygous alternative in NA10847 and NA11992, heterozygous in NA07347 and NA11831, and homozygous reference in NA12003
Fig. 7.
Fig. 7.
Computational requirements of DELLY for a human resequenced short-read dataset across different coverage levels for deletion discovery

Similar articles

See all similar articles

Cited by 414 PubMed Central articles

See all "Cited by" articles

References

    1. 1000 Genomes Project Consortium A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
    1. Abyzov A., Gerstein M. Age: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision. Bioinformatics. 2011;27:595–603. - PMC - PubMed
    1. Abyzov A., et al. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011;21:974–984. - PMC - PubMed
    1. Anson E. L., Myers E. W. Proc. 1st Annual International Conference on Research in Computational Molecular Biology. New York: ACM Press; 1997. Realigner: a program for refining DNA sequence multi-alignments; pp. 9–16.
    1. Barnett D. W., et al. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics. 2011;27:1691–1692. - PMC - PubMed

Publication types

Feedback