Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 7 (10), e1002217

Thousands of Rab GTPases for the Cell Biologist


Thousands of Rab GTPases for the Cell Biologist

Yoan Diekmann et al. PLoS Comput Biol.


Rab proteins are small GTPases that act as essential regulators of vesicular trafficking. 44 subfamilies are known in humans, performing specific sets of functions at distinct subcellular localisations and tissues. Rab function is conserved even amongst distant orthologs. Hence, the annotation of Rabs yields functional predictions about the cell biology of trafficking. So far, annotating Rabs has been a laborious manual task not feasible for current and future genomic output of deep sequencing technologies. We developed, validated and benchmarked the Rabifier, an automated bioinformatic pipeline for the identification and classification of Rabs, which achieves up to 90% classification accuracy. We cataloged roughly 8.000 Rabs from 247 genomes covering the entire eukaryotic tree. The full Rab database and a web tool implementing the pipeline are publicly available at For the first time, we describe and analyse the evolution of Rabs in a dataset covering the whole eukaryotic phylogeny. We found a highly dynamic family undergoing frequent taxon-specific expansions and losses. We dated the origin of human subfamilies using phylogenetic profiling, which enlarged the Rab repertoire of the Last Eukaryotic Common Ancestor with Rab14, 32 and RabL4. Furthermore, a detailed analysis of the Choanoflagellate Monosiga brevicollis Rab family pinpointed the changes that accompanied the emergence of Metazoan multicellularity, mainly an important expansion and specialisation of the secretory pathway. Lastly, we experimentally establish tissue specificity in expression of mouse Rabs and show that neo-functionalisation best explains the emergence of new human Rab subfamilies. With the Rabifier and RabDB, we provide tools that easily allows non-bioinformaticians to integrate thousands of Rabs in their analyses. RabDB is designed to enable the cell biology community to keep pace with the increasing number of fully-sequenced genomes and change the scale at which we perform comparative analysis in cell biology.

Conflict of interest statement

The authors have declared that no competing interests exist.


Figure 1
Figure 1. Flowchart of the Rabifier.
(A) Identification- and (B) classification-procedure implemented by the Rabifier, see Results and Discussion for details on the two phases. Panel (C) shows descriptive statistics from the application of the Rabifier to 247 genomes in the Superfamily database, and details about Monosiga brevicollis. Abbreviations: best BLAST hit (BH) , Rab family motif (RabF) , reverse Ψ-BLAST (RPS-BLAST) , subfamily (sf.), Rab not classified to any subfamily within our internal reference set (RabX).
Figure 2
Figure 2. Validation and benchmarking of the Rabifier.
(A) summarises the validation in normal mode, i.e. without taking the subfamily score produced by Rabifier into account, against the Rab families of Trypanosoma brucei , Entamoeba histolytica and Monosiga brevicollis, which we annotated in (E). Three quantities needed to judge the performance of the Rabifier are shown for Rabs belonging to human and other subfamilies separately: sequences erroneously classified as not being a Rab by the Rabifier (red), sequences correctly identified as Rabs, however, wrongly classified at subfamily level (light green), and those which were entirely correct (dark green). (B) displays the distribution of confidence scores associated to each subfamily call, respecting the same colour code as above. The blue line indicates the threshold which we propose on default, and below which subfamily classification may be rejected and treated as a undefined RabX. That choice is based on the ROC-curve analysis shown in (C), which plots the true positive rate against the false positive rate for each possible confidence threshold and provides a combined measure of the accuracy of a classifier (Area under the curve, AUC [39]). The effect of choosing an 0.4 confidence threshold (blue circle) on the classification accuracy, i.e. running the Rabifier in high confidence mode, is shown in the inlay. (D) plots the improvement in terms of the three quantities discussed above the Rabifier achieves compared to an alternative strategy (see Results and Discussion for details on its implementation). (E) Phylogenetic tree of the human and M. brevicollis Rab family on which the manual classification of the latter Rab family was based (bootstrap support above 70% shown). Colours indicate the results of the corresponding automated annotation for that specific sequence. Abbreviations: subfamily (sf.), annotation (annot.).
Figure 3
Figure 3. Resources we make available.
(A) Snapshots of the database which provides public access to the results of the Rabifier applied to the Superfamily database and the online version of the Rabifier. (B) Statistics of the current content of in terms of number of genomes (left), absolute number of Rabs either belonging to a subfamily also present in humans or not (middle), and the relative fraction of the two types of Rabs for a given branch (right). The cladogram (i.e. the branch length are arbitrary, see [114]) of the eukaryotic taxa is derived from .
Figure 4
Figure 4. Rab subfamilies in or dataset.
Number of different Rab subfamilies found in our dataset. Human sf. are shown in blue, and other known sf. in orange. The last four categories are hypothetical subfamilies we propose in the context of this paper (see Materials and Methods for details on the procedure): subfamilies whose members span more than one taxon (red), those spanning more than on genome (green), subfamilies with several members yet only present in one organism (brown) and finally singletons (grey) which are not similar to any other known Rab. All members and subfamilies can be browsed in our website at Abbreviations: hypothetical (hypo.), subfamily (sf.).
Figure 5
Figure 5. Rab subfamily expansions relative to Metazoa in a dataset of 247 genomes.
For each of the eukaryotic taxa (as derived from [115]), (A) displays the relative size compared to Metazoa of each human Rab subfamily on average per genome. The dashed line represents the average in Metazoan genomes, i.e. any circle lying on that line represents a human subfamily that has the same amount of members on average per genome than on average in Metazoa. Similarly, any circle to the left represents a subfamily that is smaller compared to Metazoa, finally, all on the right are expanded compared to the Metazoan average. Note that the axis are in logarithmic scale. In addition to the numbers indicating the human Rab subfamily, a colour code to distinguish subfamilies is shown below, where similar colours indicate proximity in the phylogenetic tree of human Rabs. The same plot for all other Rabs is shown in (B), again on a logarithmic scale. All sequences used are accessible at Abbreviations: subfamily (sf.).
Figure 6
Figure 6. Phylogenetic profiles of human Rab subfamilies in selected organisms.
A black dot reads as presence of the corresponding subfamily in the respective species. Rab subfamilies are ordered according to the top phylogenetic tree generated as explained in Materials and Methods . Branches with bootstrap support above 58 are coloured in red. The tree on the left represents the species' branching order and is derived from – together with the naming of the partially nested monophyletic groups on the right.
Figure 7
Figure 7. Summary of evolutionary age and duplication origin of human subfamilies.
Each level represents a nested evolutionary stage from the LECA to humans (derived from [115], [119]) with one circle per human subfamily. Those subfamilies for which we could establish a clear origin, that is which subfamily it was derived from by duplication, are right from the dotted line with the subfamily it was derived from attached at the bottom right.
Figure 8
Figure 8. Increasing tissue specificity in expression of derived Rabs in mice.
Summary of PCR experiments establishing expression (black squares) or lack thereof (white squares) of mouse Rabs in six tissues and five mouse cell lines. Stars on the bottom indicate subfamilies which we found already present in LECA, and that predate the evolution of multicellularity (see Figure 7 ). Branches coloured in blue in the phylogenetic tree of mouse Rabs on the left are those for which we test the hypothesis that derived subfamilies are expressed in the same or in a subset of tissues of the Rab they were derived from (see Figure 7 for a summary of which Rabs have a clear origin). Abbreviations: subfamily (sf.), primary Hepatocytes (Prim. Hepatoc.), multicellularity (multic.), last eukaryotic common ancestor (LECA).

Similar articles

See all similar articles

Cited by 71 PubMed Central articles

See all "Cited by" articles


    1. Aridor M, Hannan LA. Traffic jam: a compendium of human diseases that affect intracellular transport processes. Traffic. 2000;1:836–851. - PubMed
    1. Aridor M, Hannan LA. Traffic jams II: an update of diseases of intracellular transport. Traffic. 2002;3:781–790. - PubMed
    1. Seabra MC, Mules EH, Hume AN. Rab GTPases, intracellular traffic and disease. Trends Mol Med. 2002;8:23–30. - PubMed
    1. Mitra S, Cheng KW, Mills GB. Rab GTPases Implicated in Inherited and Acquired Disorders. Semin Cell Dev Biol. 2011;22:57–68. - PMC - PubMed
    1. Agarwal R, Jurisica I, Mills GB, Cheng KW. The emerging role of the RAB25 small GTPase in cancer. Traffic. 2009;10:1561–1568. - PMC - PubMed

Publication types