Polymorphic edge detection (PED): two efficient methods of polymorphism detection from next-generation sequencing data

BMC Bioinformatics. 2019 Jun 28;20(1):362. doi: 10.1186/s12859-019-2955-6.

Abstract

Background: Accurate detection of polymorphisms with a next generation sequencer data is an important element of current genetic analysis. However, there is still no detection pipeline that is completely reliable.

Result: We demonstrate two new detection methods of polymorphisms focusing on the Polymorphic Edge (PED). In the matching between two homologous sequences, the first mismatched base to appear is the SNP, or the edge of the structural variation. The first method is based on k-mers from short reads and detects polymorphic edges with k-mers for which there is no match between target and control, making it possible to detect SNPs by direct comparison of short-reads in two datasets (target and control) without a reference genome sequence. The second method is based on bidirectional alignment to detect polymorphic edges, not only SNPs but also insertions, deletions, inversions and translocations. Using these two methods, we succeed in making a high-quality comparison map between rice cultivars showing good match to the theoretical value of introgression, and in detecting specific large deletions across cultivars.

Conclusions: Using Polymorphic Edge Detection (PED), the k-mer method is able to detect SNPs by direct comparison of short-reads in two datasets without genomic alignment step, and the bidirectional alignment method is able to detect SNPs and structural variations from even single-end short-reads. The PED is an efficient tool to obtain accurate data for both SNPs and structural variations.

Availability: The PED software is available at: https://github.com/akiomiyao/ped .

Keywords: Indel; Mutation; NGS; Polymorphism; SV.

MeSH terms

  • Computational Biology / methods*
  • High-Throughput Nucleotide Sequencing*
  • Polymorphism, Single Nucleotide*
  • Sequence Analysis, DNA
  • Software