Jointly aligning a group of DNA reads improves accuracy of identifying large deletions

Nucleic Acids Res. 2018 Feb 16;46(3):e18. doi: 10.1093/nar/gkx1175.

Abstract

Performing sequence alignment to identify structural variants, such as large deletions, from genome sequencing data is a fundamental task, but current methods are far from perfect. The current practice is to independently align each DNA read to a reference genome. We show that the propensity of genomic rearrangements to accumulate in repeat-rich regions imposes severe ambiguities in these alignments, and consequently on the variant calls-with current read lengths, this affects more than one third of known large deletions in the C. Venter genome. We present a method to jointly align reads to a genome, whereby alignment ambiguity of one read can be disambiguated by other reads. We show this leads to a significant improvement in the accuracy of identifying large deletions (≥20 bases), while imposing minimal computational overhead and maintaining an overall running time that is at par with current tools. A software implementation is available as an open-source Python program called JRA at https://bitbucket.org/jointreadalignment/jra-src.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Base Sequence*
  • Cell Line
  • DNA / genetics*
  • Datasets as Topic
  • Genome, Human*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Internet
  • Male
  • Middle Aged
  • Ploidies
  • Primary Cell Culture
  • Sequence Alignment
  • Sequence Analysis, DNA
  • Sequence Deletion*
  • Software

Substances

  • DNA