Comparative Genome Annotation

Methods Mol Biol. 2018;1704:189-212. doi: 10.1007/978-1-4939-7463-4_6.

Abstract

Newly sequenced genomes are being added to the tree of life at an unprecedented fast pace. Increasingly, such new genomes are phylogenetically close to previously sequenced and annotated genomes. In other cases, whole clades of closely related species or strains ought to be annotated simultaneously. Often, in subsequent studies differences between the closely related species or strains are in the focus of research when the shared gene structures prevail. We here review methods for comparative structural genome annotation. The reviewed methods include classical approaches such as the alignment of protein sequences or protein profiles against the genome and comparative gene prediction methods that exploit a genome alignment to annotate a target genome. Newer approaches such as the simultaneous annotation of multiple genomes are also reviewed. We discuss how the methods depend on the phylogenetic placement of genomes, give advice on the choice of methods, and examine the consistency between gene structure annotations in an example. Further, we provide practical advice on genome annotation in general.

Keywords: Annotation consistency; Annotation mapping; Clade annotation; Gene prediction; Multi-genome alignment.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Animals
  • Chromosome Mapping
  • Computational Biology*
  • Databases, Genetic
  • Genome*
  • Humans
  • Molecular Sequence Annotation*
  • Phylogeny
  • Sequence Alignment
  • Sequence Analysis, DNA
  • Software