Look4TRs: a de novo tool for detecting simple tandem repeats using self-supervised hidden Markov models

Bioinformatics. 2020 Jan 15;36(2):380-387. doi: 10.1093/bioinformatics/btz551.

Abstract

Motivation: Simple tandem repeats, microsatellites in particular, have regulatory functions, links to several diseases and applications in biotechnology. There is an immediate need for an accurate tool for detecting microsatellites in newly sequenced genomes. The current available tools are either sensitive or specific but not both; some tools require adjusting parameters manually.

Results: We propose Look4TRs, the first application of self-supervised hidden Markov models to discovering microsatellites. Look4TRs adapts itself to the input genomes, balancing high sensitivity and low false positive rate. It auto-calibrates itself. We evaluated Look4TRs on 26 eukaryotic genomes. Based on F measure, which combines sensitivity and false positive rate, Look4TRs outperformed TRF and MISA-the most widely used tools-by 78 and 84%. Look4TRs outperformed the second and the third best tools, MsDetector and Tantan, by 17 and 34%. On eight bacterial genomes, Look4TRs outperformed the second and the third best tools by 27 and 137%.

Availability and implementation: https://github.com/TulsaBioinformaticsToolsmith/Look4TRs.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Eukaryota
  • Genome, Bacterial
  • Genomics*
  • Microsatellite Repeats
  • Software