OLTA: Optimizing bait seLection for TArgeted sequencing

Bioinformatics. 2025 Mar 29;41(4):btaf146. doi: 10.1093/bioinformatics/btaf146.

Abstract

Motivation: Targeted enrichment via capture probes, also known as baits, is a promising complementary procedure for next-generation sequencing methods. This technique uses short biotinylated oligonucleotide probes that hybridize with complementary genetic material in a sample. Following hybridization, the target fragments can be easily isolated and processed with minimal contamination from irrelevant material. Designing an efficient set of baits for a set of target sequences, however, is an NP-hard problem.

Results: We develop a novel heuristic algorithm that leverages the similarities between the characteristics of the Minimum Bait Cover and the Closest String problems to reduce the number of baits to cover a given target sequence. Our results on real and synthetic datasets demonstrate that our algorithm, OLTA produces fewest baits for nearly all experimental settings and datasets. On average, it produces 6% and 11% fewer baits than the next best state-of-the-art methods for two major real datasets, AIV and MEGARES. Also, its bait set has the highest utilization and the minimum redundancy.

Availability and implementation: Our algorithm is available at github.com/FuelTheBurn/OLTA-Optimizing-bait-seLection-for-TArgeted-sequencing. Test data and other software are archived at doi.org/10.5281/zenodo.15086636.

MeSH terms

  • Algorithms*
  • High-Throughput Nucleotide Sequencing* / methods
  • Humans
  • Oligonucleotide Probes / chemistry
  • Oligonucleotide Probes / genetics
  • Sequence Analysis, DNA* / methods
  • Software*

Substances

  • Oligonucleotide Probes