Long-read mapping to repetitive reference sequences using Winnowmap2

Nat Methods. 2022 Jun;19(6):705-710. doi: 10.1038/s41592-022-01457-8. Epub 2022 Apr 1.

Abstract

Approximately 5-10% of the human genome remains inaccessible due to the presence of repetitive sequences such as segmental duplications and tandem repeat arrays. We show that existing long-read mappers often yield incorrect alignments and variant calls within long, near-identical repeats, as they remain vulnerable to allelic bias. In the presence of a nonreference allele within a repeat, a read sampled from that region could be mapped to an incorrect repeat copy. To address this limitation, we developed a new long-read mapping method, Winnowmap2, by using minimal confidently alignable substrings. Winnowmap2 computes each read mapping through a collection of confident subalignments. This approach is more tolerant of structural variation and more sensitive to paralog-specific variants within repeats. Our experiments highlight that Winnowmap2 successfully addresses the issue of allelic bias, enabling more accurate downstream variant calls in repetitive sequences.

Publication types

  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Genome, Human*
  • Humans
  • Repetitive Sequences, Nucleic Acid* / genetics
  • Segmental Duplications, Genomic
  • Sequence Analysis, DNA
  • Tandem Repeat Sequences