Identifying centromeric satellites with dna-brnn

Bioinformatics. 2019 Nov 1;35(21):4408-4410. doi: 10.1093/bioinformatics/btz264.

Abstract

Summary: Human alpha satellite and satellite 2/3 contribute to several percent of the human genome. However, identifying these sequences with traditional algorithms is computationally intensive. Here we develop dna-brnn, a recurrent neural network to learn the sequences of the two classes of centromeric repeats. It achieves high similarity to RepeatMasker and is times faster. Dna-brnn explores a novel application of deep learning and may accelerate the study of the evolution of the two repeat classes.

Availability and implementation: https://github.com/lh3/dna-nn.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Centromere*
  • DNA, Satellite
  • Genome, Human
  • Humans
  • Neural Networks, Computer

Substances

  • DNA, Satellite