Improved Annotation of Protein-Coding Genes Boundaries in Metazoan Mitochondrial Genomes

Nucleic Acids Res. 2019 Nov 18;47(20):10543-10552. doi: 10.1093/nar/gkz833.


With the rapid increase of sequenced metazoan mitochondrial genomes, a detailed manual annotation is becoming more and more infeasible. While it is easy to identify the approximate location of protein-coding genes within mitogenomes, the peculiar processing of mitochondrial transcripts, however, makes the determination of precise gene boundaries a surprisingly difficult problem. We have analyzed the properties of annotated start and stop codon positions in detail, and use the inferred patterns to devise a new method for predicting gene boundaries in de novo annotations. Our method benefits from empirically observed prevalances of start/stop codons and gene lengths, and considers the dependence of these features on variations of genetic codes. Albeit not being perfect, our new approach yields a drastic improvement in the accuracy of gene boundaries and upgrades the mitochondrial genome annotation server MITOS to an even more sophisticated tool for fully automatic annotation of metazoan mitochondrial genomes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Genetic Code
  • Genome, Mitochondrial
  • Mitochondrial Proteins / genetics*
  • Mitochondrial Proteins / metabolism
  • Molecular Sequence Annotation / methods*
  • Molecular Sequence Annotation / standards
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism


  • Mitochondrial Proteins
  • RNA, Messenger