A quick guide for student-driven community genome annotation

Prashant S Hosmani; Teresa Shippy; Sherry Miller; Joshua B Benoit; Monica Munoz-Torres; Mirella Flores-Gonzalez; Lukas A Mueller; Helen Wiersma-Koch; Tom D'Elia; Susan J Brown; Surya Saha

doi:10.1371/journal.pcbi.1006682

A quick guide for student-driven community genome annotation

PLoS Comput Biol. 2019 Apr 3;15(4):e1006682. doi: 10.1371/journal.pcbi.1006682. eCollection 2019 Apr.

Authors

Affiliations

¹ Boyce Thompson Institute, Ithaca, New York.
² Division of Biology, Kansas State University, Manhattan, Kansas.
³ Department of Biological Sciences, University of Cincinnati, Cincinnati, Ohio.
⁴ Lawrence Berkeley National Laboratory, Environmental Genomics and Systems Biology, Berkeley, California.
⁵ Center for Genome Research and Biocomputing, Oregon State University, Corvallis, Oregon.
⁶ Indian River State College, Fort Pierce, Florida.

Abstract

High quality gene models are necessary to expand the molecular and genetic tools available for a target organism, but these are available for only a handful of model organisms that have undergone extensive curation and experimental validation over the course of many years. The majority of gene models present in biological databases today have been identified in draft genome assemblies using automated annotation pipelines that are frequently based on orthologs from distantly related model organisms and usually have minor or major errors. Manual curation is time consuming and often requires substantial expertise, but is instrumental in improving gene model structure and identification. Manual annotation may seem to be a daunting and cost-prohibitive task for small research communities but involving undergraduates in community genome annotation consortiums can be mutually beneficial for both education and improved genomic resources. We outline a workflow for efficient manual annotation driven by a team of primarily undergraduate annotators. This model can be scaled to large teams and includes quality control processes through incremental evaluation. Moreover, it gives students an opportunity to increase their understanding of genome biology and to participate in scientific research in collaboration with peers and senior researchers at multiple institutions.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Computational Biology / education*
Databases, Genetic / statistics & numerical data
Genomics / education*
Genomics / statistics & numerical data
Guidelines as Topic
Humans
Models, Genetic*
Molecular Sequence Annotation / statistics & numerical data*
Students

Grants and funding

P20 GM103418/GM/NIGMS NIH HHS/United States