Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 36 (10), 3420-35

High-throughput Functional Annotation and Data Mining With the Blast2GO Suite

Affiliations

High-throughput Functional Annotation and Data Mining With the Blast2GO Suite

Stefan Götz et al. Nucleic Acids Res.

Abstract

Functional genomics technologies have been widely adopted in the biological research of both model and non-model species. An efficient functional annotation of DNA or protein sequences is a major requirement for the successful application of these approaches as functional information on gene products is often the key to the interpretation of experimental results. Therefore, there is an increasing need for bioinformatics resources which are able to cope with large amount of sequence data, produce valuable annotation results and are easily accessible to laboratories where functional genomics projects are being undertaken. We present the Blast2GO suite as an integrated and biologist-oriented solution for the high-throughput and automatic functional annotation of DNA or protein sequences based on the Gene Ontology vocabulary. The most outstanding Blast2GO features are: (i) the combination of various annotation strategies and tools controlling type and intensity of annotation, (ii) the numerous graphical features such as the interactive GO-graph visualization for gene-set function profiling or descriptive charts, (iii) the general sequence management features and (iv) high-throughput capabilities. We used the Blast2GO framework to carry out a detailed analysis of annotation behaviour through homology transfer and its impact in functional genomics research. Our aim is to offer biologists useful information to take into account when addressing the task of functionally characterizing their sequence data.

Figures

Figure 1.
Figure 1.
Schematic representation of the Blast2GO application. GO annotations are generated through a three-step process: BLAST, mapping, annotation. InterPro terms are obtained from InterProScan at EBI, converted and merged to GOs. GO annotation can be modulated from Annex, GOSlim web services and manual editing. Enzyme Code and KEGG Pathway map annotations are retrieved through mappings from GO. Visual tools include sequence colour code, KEGG pathways and GO graphs with GO term highlighting and filtering options. Additional annotation data-mining tools include statistical charts and gene set enrichment analysis functions.
Figure 2.
Figure 2.
Percentages of annotated sequences in relation to their length shown in base pair. For all datasets, a positive correlation between sequence length and annotability is observed. The sudden drop of the gma, tha and ccl curves responds to the absence of sequences at long lengths for those datasets.
Figure 3.
Figure 3.
Changes in the annotation results after applying InterProScan and Annex functions. Annotation increment was computed as the difference in annotation percentages with and without augmenting parameters. While Annex shows a general increase in GO terms InterPro augments the number of annotated sequences especially with restrictive annotation configurations.
Figure 4.
Figure 4.
Summary statistics of manual curation study. Manual evaluation was applied on GO annotation results of eight basic annotation styles applied on cc1, pfl and min dataset. Annotation of 100 sequences per dataset was review and classified as: approved at default style, approved but more or less informative than default, rejected, generally approved with minor possible errors or missed (no GO terms recovered). Percentages of each class are given on the total number of sequences.
Figure 5.
Figure 5.
Results of the annotation performance evaluation task. The number of GO term per sequence (A), the average level of GO term (B) and the percentage of successfully annotated sequences over the full dataset (C) are given for seven different datasets annotated at 32 different annotation styles (see Methods section for details).

Similar articles

See all similar articles

Cited by 1,232 articles

See all "Cited by" articles

References

    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. - PMC - PubMed
    1. Jones C, Brown A, Baumann U. Estimating the annotation error rate of curated GO database sequence annotations. BMC Bioinformatics. 2007;8:170. - PMC - PubMed
    1. Baumgartner W, Cohen B, Fox L, Acquaah-Mensah G, Hunter L. Manual curation is not sufficient for annotation of genomic databases. Bioinformatics. 2007;23:41–48. - PMC - PubMed
    1. Frishman D. Protein annotation at genomic scale: the current status. Chem. Rev. 2007;107:3448–3466. - PubMed
    1. Artamonova I, Frishman G, Frishman D. Applying negative rule mining to improve genome annotation. BMC Bioinformatics. 2007;8:261. - PMC - PubMed

Publication types

Feedback