Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 4;44(D1):D286-93.
doi: 10.1093/nar/gkv1248. Epub 2015 Nov 17.

eggNOG 4.5: A Hierarchical Orthology Framework With Improved Functional Annotations for Eukaryotic, Prokaryotic and Viral Sequences

Free PMC article

eggNOG 4.5: A Hierarchical Orthology Framework With Improved Functional Annotations for Eukaryotic, Prokaryotic and Viral Sequences

Jaime Huerta-Cepas et al. Nucleic Acids Res. .
Free PMC article


eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating OGs across taxonomic levels, making nested groups hierarchically consistent. This allows for a better propagation of functional terms across nested OGs and led to the novel annotation of 95 890 previously uncharacterized OGs, increasing overall annotation coverage from 67% to 72%. The functional annotations of OGs have been expanded to also provide Gene Ontology terms, KEGG pathways and SMART/Pfam domains for each group. Moreover, eggNOG now provides pairwise orthology relationships within OGs based on analysis of phylogenetic trees. We have also incorporated a framework for quickly mapping novel sequences to OGs based on precomputed HMM profiles. Finally, eggNOG version 4.5 incorporates a novel data set spanning 2605 viral OGs, covering 5228 proteins from 352 viral proteomes. All data are accessible for bulk downloading, as a web-service, and through a completely redesigned web interface. The new access points provide faster searches and a number of new browsing and visualization capabilities, facilitating the needs of both experts and less experienced users. eggNOG v4.5 is available at


Figure 1.
Figure 1.
Schematic representation of the eggNOG pipeline: Boxes labelled in green indicate new data and/or methods added in this version. Blue labels represent updated methodology and/or data with respect to previous versions. Grey boxes indicate unchanged steps in version 4.5.
Figure 2.
Figure 2.
(A) Hierarchically consistent structure of OGs including genes from the SEC24 protein family, from the root taxonomic level (Last Universal Common Ancestor, LUCA) to the rodents specific level. Each OG is represented by a box labelled with the lineage name it belongs to, and whose size is proportional to the number of proteins grouped. Boxes filled with a blue gradient represent the nested hierarchy of OGs specifically containing the mouse SEC24A protein. Grey boxes indicate collapsed branches in the OG hierarchy. Note that another Bilateria-specific OG exist, but has been collapsed for readability reasons. The most lineage-specific OG containing the mouse SEC24A protein is at the rodents taxonomic level, which is coloured in pink. Fine grained orthology for SEC24 genes, based on the phylogenetic analysis of the 18VD7 rodent-specific group, is shown in the bottom part, with tree branches indicating a lineage specific duplication. (B) Viral taxonomic tree. Black branches indicate levels for which OGs were calculated, whereas white branches indicate no OG was calculated at this level. Numbers indicate the number of OGs at this level, the number of proteins contained in all OGs at this level and the number of proteomes represented by the proteins within all OGs at this level, respectively.
Figure 3.
Figure 3.
Website screenshots showing fish and primate orthologs for the myosin protein MYO7AA. (A) The guided search dialog used to retrieve the orthologs. (B) Partial tree representation of the associated phylogenetic tree. Blue nodes in the tree represent speciation events. Red nodes indicate duplication events (in-paralogs). Pfam domains are shown in-line for all the orthologous sequences. Note that tree visualization is adapted to the query, highlighting the seed and target species and graying out the rest. (C) Taxonomic profile representation showing the distribution of orthologs in the tree of life. (D) Functional profile based on Gene Ontology terms associated to the OG. (E) Filtered content of the OG (protein names and sequences), restricted to the query and target species. (F) Pairwise orthology predictions adapted to the query protein and the target species. In-paralogy and co-orthology relationships are resolved according to the speciation and duplication events inferred from the phylogenetic tree.

Similar articles

See all similar articles

Cited by 455 articles

See all "Cited by" articles


    1. Ohno S. Evolution by Gene Duplication. NY: Springer Science & Business Media; 2013.
    1. Studer R.A., Robinson-Rechavi M. How confident can we be that orthologs are similar, but paralogs differ? Trends Genet. 2009;25:210–216. - PubMed
    1. Nehrt N.L., Clark W.T., Radivojac P., Hahn M.W. Testing the ortholog conjecture with comparative functional genomic data from mammals. PLoS Comput. Biol. 2011;7:e1002073. - PMC - PubMed
    1. Gabaldón T., Koonin E.V. Functional and evolutionary implications of gene orthology. Nat. Rev. Genet. 2013;14:360–366. - PMC - PubMed
    1. Koonin E.V. Orthologs, paralogs, and evolutionary genomics. Annu. Rev. Genet. 2005;39:309–338. - PubMed

Publication types