COG database update: focus on microbial diversity, model organisms, and widespread pathogens

Nucleic Acids Res. 2021 Jan 8;49(D1):D274-D281. doi: 10.1093/nar/gkaa1018.

Abstract

The Clusters of Orthologous Genes (COG) database, also referred to as the Clusters of Orthologous Groups of proteins, was created in 1997 and went through several rounds of updates, most recently, in 2014. The current update, available at https://www.ncbi.nlm.nih.gov/research/COG, substantially expands the scope of the database to include complete genomes of 1187 bacteria and 122 archaea, typically, with a single genome per genus. In addition, the current version of the COGs includes the following new features: (i) the recently deprecated NCBI's gene index (gi) numbers for the encoded proteins are replaced with stable RefSeq or GenBank\ENA\DDBJ coding sequence (CDS) accession numbers; (ii) COG annotations are updated for >200 newly characterized protein families with corresponding references and PDB links, where available; (iii) lists of COGs grouped by pathways and functional systems are added; (iv) 266 new COGs for proteins involved in CRISPR-Cas immunity, sporulation in Firmicutes and photosynthesis in cyanobacteria are included; and (v) the database is made available as a web page, in addition to FTP. The current release includes 4877 COGs. Future plans include further expansion of the COG collection by adding archaeal COGs (arCOGs), splitting the COGs containing multiple paralogs, and continued refinement of COG annotations.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Archaea / genetics*
  • Archaea / metabolism
  • Archaeal Proteins / classification
  • Archaeal Proteins / genetics
  • Archaeal Proteins / metabolism
  • Bacteria / genetics*
  • Bacteria / immunology
  • Bacteria / metabolism
  • Bacterial Proteins / classification
  • Bacterial Proteins / genetics
  • Bacterial Proteins / metabolism
  • CRISPR-Cas Systems
  • Databases, Genetic*
  • Gene Ontology
  • Genome, Archaeal*
  • Genome, Bacterial*
  • Humans
  • Molecular Sequence Annotation
  • Spores, Bacterial / genetics
  • Spores, Bacterial / growth & development

Substances

  • Archaeal Proteins
  • Bacterial Proteins