A quick guide for student-driven community genome annotation

PLoS Comput Biol. 2019 Apr 3;15(4):e1006682. doi: 10.1371/journal.pcbi.1006682. eCollection 2019 Apr.


High quality gene models are necessary to expand the molecular and genetic tools available for a target organism, but these are available for only a handful of model organisms that have undergone extensive curation and experimental validation over the course of many years. The majority of gene models present in biological databases today have been identified in draft genome assemblies using automated annotation pipelines that are frequently based on orthologs from distantly related model organisms and usually have minor or major errors. Manual curation is time consuming and often requires substantial expertise, but is instrumental in improving gene model structure and identification. Manual annotation may seem to be a daunting and cost-prohibitive task for small research communities but involving undergraduates in community genome annotation consortiums can be mutually beneficial for both education and improved genomic resources. We outline a workflow for efficient manual annotation driven by a team of primarily undergraduate annotators. This model can be scaled to large teams and includes quality control processes through incremental evaluation. Moreover, it gives students an opportunity to increase their understanding of genome biology and to participate in scientific research in collaboration with peers and senior researchers at multiple institutions.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology / education*
  • Databases, Genetic / statistics & numerical data
  • Genomics / education*
  • Genomics / statistics & numerical data
  • Guidelines as Topic
  • Humans
  • Models, Genetic*
  • Molecular Sequence Annotation / statistics & numerical data*
  • Students