Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 42 (Database issue), D206-14

The SEED and the Rapid Annotation of Microbial Genomes Using Subsystems Technology (RAST)

Affiliations

The SEED and the Rapid Annotation of Microbial Genomes Using Subsystems Technology (RAST)

Ross Overbeek et al. Nucleic Acids Res.

Abstract

In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.

Figures

Figure 1.
Figure 1.
The ‘Compare Regions’ tool in the SEED. The Staphylococcal SCCmec element is shown as an example. Re-arrangements within Staphylococcal SCCmec element lead to constitutive expression of resistance determinant MecA due to (partial) deletion of repressor MecI and/or sensor-transducer MecR. Homologous genes are presented as arrows with matching colors and numbers. Genes not conserved within the displayed region are gray. The graphic is centered on the focus gene (red, #1): Methicillin resistance determinant MecA; green, #8: Methicillin resistance regulatory sensor-transducer MecR1; blue, #18: Methicillin resistance repressor MecI; green, #2: transposase for IS431.
Figure 2.
Figure 2.
Circle plot showing the comparison of eight Brucella genomes relative to a user-defined reference genome. The zoomed regions highlight insertions/deletions (colored versus white) and changes in conservation relative to the reference genome (going from blue representing the highest protein sequence similarity to red representing the lowest).
Figure 3.
Figure 3.
Number of users (open squares) and number of jobs (closed circles) in the RAST system. As of September 2013, there were over 100 000 jobs processed by RAST and >12 000 active users of the system.
Figure 4.
Figure 4.
Genomes processed by RAST displayed over a taxonomic tree. In all, 12 289 RAST annotated public genomes for PATRIC available on the PubSEED were compared at the order level using the NCBI taxonomy (25). Black bars show the number of sequenced representatives per order. White bars show those orders with no sequenced representatives. The tree was created using the Interactive Tree of Life (http://itol.embl.de/) and is unrooted.

Similar articles

See all similar articles

Cited by 1,296 articles

See all "Cited by" articles

References

    1. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb J-F, Dougherty BA, Merrick JM, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995;269:496–512. - PubMed
    1. Koonin EV, Galperin MY. Sequence-Evolution-Function: Computational Approaches in Comparative Genomics. Boston, MA: Kluwer Academic; 2003. Genome annotation and analysis.
    1. Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, et al. The complete genome sequence of Escherichia coli K-12. Science. 1997;277:1453–1462. - PubMed
    1. Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RA, Gocayne JD, et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science. 1996;273:1058–1073. - PubMed
    1. Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;33:5691–5702. - PMC - PubMed

Publication types

Substances

Feedback