Sprites: detection of deletions from sequencing data by re-aligning split reads

Bioinformatics. 2016 Jun 15;32(12):1788-96. doi: 10.1093/bioinformatics/btw053. Epub 2016 Feb 1.

Abstract

Motivation: Advances of next generation sequencing technologies and availability of short read data enable the detection of structural variations (SVs). Deletions, an important type of SVs, have been suggested in association with genetic diseases. There are three types of deletions: blunt deletions, deletions with microhomologies and deletions with microsinsertions. The last two types are very common in the human genome, but they pose difficulty for the detection. Furthermore, finding deletions from sequencing data remains challenging. It is highly appealing to develop sensitive and accurate methods to detect deletions from sequencing data, especially deletions with microhomology and deletions with microinsertion.

Results: We present a novel method called Sprites (SPlit Read re-alIgnment To dEtect Structural variants) which finds deletions from sequencing data. It aligns a whole soft-clipping read rather than its clipped part to the target sequence, a segment of the reference which is determined by spanning reads, in order to find the longest prefix or suffix of the read that has a match in the target sequence. This alignment aims to solve the problem of deletions with microhomologies and deletions with microinsertions. Using both simulated and real data we show that Sprites performs better on detecting deletions compared with other current methods in terms of F-score.

Availability and implementation: Sprites is open source software and freely available at https://github.com/zhangzhen/sprites

Contact: jxwang@mail.csu.edu.cnSupplementary data: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Genome, Human
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Sequence Deletion
  • Software