Background: Phylogenomic pipelines generate a large collection of phylogenetic trees that require manual inspection to answer questions about gene or genome evolution. A notable application of phylogenomics is to photosynthetic organelle (plastid) endosymbiosis. In the case of primary endosymbiosis, a heterotrophic protist engulfed a cyanobacterium, giving rise to the first photosynthetic eukaryote. Plastid establishment precipitated extensive gene transfer from the endosymbiont to the nuclear genome of the 'host'. Estimating the magnitude of this endosymbiotic gene transfer (EGT) and determining the functions of the prokaryotic genes remain controversial issues. We used phylogenomics to study EGT in the model green alga Chlamydomonas reinhardtii. To facilitate this procedure, we developed PhyloSort to rapidly search large collection of trees for monophyletic relationships. Here we present PhyloSort and its application to estimating EGT in Chlamydomonas.
Results: PhyloSort is an open-source tool to sort phylogenetic trees by searching for user specified subtrees that contain a monophyletic group of interest defined by operational taxonomic units in a phylogenomic context. Using PhyloSort, we identified 897 Chlamydomonas genes of putative cyanobacterial origin, of which 531 had bootstrap support values >/= 50% for the grouping of the algal and cyanobacterial homologs.
Conclusion: PhyloSort can be applied to quantify the number of genes that support different evolutionary hypotheses such as a taxonomic classification or endosymbiotic or horizontal gene transfer events. In our application, we demonstrate that cyanobacteria account for 3.5-6% of the protein-coding genes in the nuclear genome of Chlamydomonas.