Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan;39(Database issue):D1095-102.
doi: 10.1093/nar/gkq811. Epub 2010 Sep 22.

GreenPhylDB v2.0: Comparative and Functional Genomics in Plants

Free PMC article

GreenPhylDB v2.0: Comparative and Functional Genomics in Plants

Mathieu Rouard et al. Nucleic Acids Res. .
Free PMC article


GreenPhylDB is a database designed for comparative and functional genomics based on complete genomes. Version 2 now contains sixteen full genomes of members of the plantae kingdom, ranging from algae to angiosperms, automatically clustered into gene families. Gene families are manually annotated and then analyzed phylogenetically in order to elucidate orthologous and paralogous relationships. The database offers various lists of gene families including plant, phylum and species specific gene families. For each gene cluster or gene family, easy access to gene composition, protein domains, publications, external links and orthologous gene predictions is provided. Web interfaces have been further developed to improve the navigation through information related to gene families. New analysis tools are also available, such as a gene family ontology browser that facilitates exploration. GreenPhylDB is a component of the South Green Bioinformatics Platform ( and is accessible at It enables comparative genomics in a broad taxonomy context to enhance the understanding of evolutionary processes and thus tends to speed up gene discovery.


Figure 1.
Figure 1.
Flowchart of the GreenPhylDB analyses. The input file is a multi-fasta file containing complete plant proteomes. In a first step, an automatic clustering aggregates all proteins in previously defined families. Sequences are classified as orphans if they cannot be regrouped in a cluster. Sequences composing the clusters are analyzed in order to overlay clusters with cross-references (e.g. UniProtKB, Pubmed, InterPro, MEME motifs, KEGG pathways data). Based on this information, clusters are manually curated in order to identify gene families. Finally, gene family sequences are analyzed via a phylogenetic-based pipeline to infer ortholog relationships. The procedure can be iterated for each new released genome using a lighter procedure. This ensures a cumulative and safe growth of the database. The data are stored in the database and can be easily accessed using dedicated visualizing tools including a gene tree viewer, a gene family browser and ortholog extracting tools.
Figure 2.
Figure 2.
Global overview of the family entry page for the Pollen Allergen/Expansin Superfamily (fid = 20923). (a) At a glance, users can view that the family is curated (green light) and is plant specific. (b) Annotated gene families at the different levels are underlined. Gene families are colored in blue when the phylogenetic analyses are being performed. Here, three gene families are annotated at level 2 (names pop up when you mouse-over) and two of them were analyzed (gene tree and orthologs are available). (c) This superfamily contains genes from 15 out of the 16 species. Indeed, there is no representative in the Cyanidioschyzon merolae, a red algae and a large expansion is predicted starting in the embryophytes. (d) The Expansin/Lol pI InterPro family entry is specific to the Pollen Allergen/Expansin Superfamily. Several other representative domains are listed and graphically represented in a consensus schema. (e) Multiple alignment and gene tree Java applets (Jalview and Archeopteryx) including orthology scores can be launched. (f) Gene positions on several genomes are available using GViewer. A zoom on chromosome 10 of Zea mays is visible.
Figure 3.
Figure 3.
This example illustrates a putative study of a rice gene (Os10g35050.1) and its predicted orthologs in other species of GreenPhylDB. One ortholog gene is found in sorghum (Sb01g018430.1) and in brachypodium (Super_8.1280_1). The query sequence has also two co-orthologs in Arabidopsis (At1g17810.1 in red, At1g73190.1 in blue) that are cross-linked to Genevestigator expression data tools (v3). Os10g35050.1 is over-expressed at the dough stage while At1g17810.1 and At1g73190.1 are expressed in the silique. This may indicate a role in seed development. Moreover, it might be interesting to note that these genes are all over-expressed under drought conditions or in presence of abscissic acid (ABA). This is consistent with the fact that tonoplast-type aquaporins (TIPs) facilitate osmotic water transport across membranes and it suggests a role in response to drought stress.

Similar articles

See all similar articles

Cited by 55 articles

See all "Cited by" articles


    1. Liolios K, Chen IA, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM, Kyrpides NC. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2010;38:D346–D354. - PMC - PubMed
    1. Bowman JL, Floyd SK, Sakakibara K. Green genes-comparative genomics of the green branch of life. Cell. 2007;129:229–234. - PubMed
    1. Varshney RK, Graner A, Sorrells ME. Genomics-assisted breeding for crop improvement. Trends Plant Sci. 2005;10:621–630. - PubMed
    1. Flavell R. From genomics to crop breeding. Nat. Biotech. 2010;28:144–145. - PubMed
    1. Conte MG, Gaillard S, Lanau N, Rouard M, Périn C. GreenPhylDB: a database for plant comparative genomics. Nucleic Acids Res. 2008;36:D991–D998. - PMC - PubMed

Publication types