The state of play in higher eukaryote gene annotation

Nat Rev Genet. 2016 Dec;17(12):758-772. doi: 10.1038/nrg.2016.119. Epub 2016 Oct 24.

Abstract

A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe - or 'annotate' - genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists - from clinicians to evolutionary biologists - need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.

Publication types

  • Review

MeSH terms

  • Animals
  • Eukaryota / genetics*
  • Genomics / methods*
  • Humans
  • Molecular Sequence Annotation / methods*
  • Sequence Analysis, DNA / methods*