SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics
- PMID: 17288575
- PMCID: PMC1796611
- DOI: 10.1186/1471-2148-7-S1-S2
SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics
Abstract
Background: Phylogenetic analyses based on datasets rich in both genes and species (phylogenomics) are becoming a standard approach to resolve evolutionary questions. However, several difficulties are associated with the assembly of large datasets, such as multiple copies of a gene per species (paralogous or xenologous genes), lack of some genes for a given species, or partial sequences. The use of undetected paralogous or xenologous genes in phylogenetic inference can lead to inaccurate results, and the use of partial sequences to a lack of resolution. A tool that selects sequences, species, and genes, while dealing with these issues, is needed in a phylogenomics context.
Results: Here, we present SCaFoS, a tool that quickly assembles phylogenomic datasets containing maximal phylogenetic information while adjusting the amount of missing data in the selection of species, sequences and genes. Starting from individual sequence alignments, and using monophyletic groups defined by the user, SCaFoS creates chimeras with partial sequences, or selects, among multiple sequences, the orthologous and/or slowest evolving sequences. Once sequences representing each predefined monophyletic group have been selected, SCaFos retains genes according to the user's allowed level of missing data and generates files for super-matrix and super-tree analyses in several formats compatible with standard phylogenetic inference software. Because no clear-cut criteria exist for the sequence selection, a semi-automatic mode is available to accommodate user's expertise.
Conclusion: SCaFos is able to deal with datasets of hundreds of species and genes, both at the amino acid or nucleotide level. It has a graphical interface and can be integrated in an automatic workflow. Moreover, SCaFoS is the first tool that integrates user's knowledge to select orthologous sequences, creates chimerical sequences to reduce missing data and selects genes according to their level of missing data. Finally, applying SCaFoS to different datasets, we show that the judicious selection of genes, species and sequences reduces tree reconstruction artefacts, especially if the dataset includes fast evolving species.
Figures
Similar articles
-
PhyloGena--a user-friendly system for automated phylogenetic annotation of unknown sequences.Bioinformatics. 2007 Apr 1;23(7):793-801. doi: 10.1093/bioinformatics/btm016. Epub 2007 Mar 1. Bioinformatics. 2007. PMID: 17332025
-
OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.BMC Bioinformatics. 2009 Jul 16;10:219. doi: 10.1186/1471-2105-10-219. BMC Bioinformatics. 2009. PMID: 19607672 Free PMC article.
-
Assessment of phylogenomic and orthology approaches for phylogenetic inference.Bioinformatics. 2007 Apr 1;23(7):815-24. doi: 10.1093/bioinformatics/btm015. Epub 2007 Jan 19. Bioinformatics. 2007. PMID: 17237036
-
Phylogenomics.Methods Mol Biol. 2018;1704:103-187. doi: 10.1007/978-1-4939-7463-4_5. Methods Mol Biol. 2018. PMID: 29277865 Review.
-
BIR Pipeline for Preparation of Phylogenomic Data.Evol Bioinform Online. 2015 Apr 27;11:79-83. doi: 10.4137/EBO.S10189. eCollection 2015. Evol Bioinform Online. 2015. PMID: 25987827 Free PMC article. Review.
Cited by
-
Phylogenomics reveals deep molluscan relationships.Nature. 2011 Sep 4;477(7365):452-6. doi: 10.1038/nature10382. Nature. 2011. PMID: 21892190 Free PMC article.
-
Gene and genome trees conflict at many levels.Philos Trans R Soc Lond B Biol Sci. 2009 Aug 12;364(1527):2209-19. doi: 10.1098/rstb.2009.0042. Philos Trans R Soc Lond B Biol Sci. 2009. PMID: 19571241 Free PMC article.
-
iPhy: an integrated phylogenetic workbench for supermatrix analyses.BMC Bioinformatics. 2011 Jan 24;12:30. doi: 10.1186/1471-2105-12-30. BMC Bioinformatics. 2011. PMID: 21261969 Free PMC article.
-
Comparative genome analysis of entomopathogenic fungi reveals a complex set of secreted proteins.BMC Genomics. 2014 Sep 29;15:822. doi: 10.1186/1471-2164-15-822. BMC Genomics. 2014. PMID: 25263348 Free PMC article.
-
A Guide to Phylogenomic Inference.Methods Mol Biol. 2024;2802:267-345. doi: 10.1007/978-1-0716-3838-5_11. Methods Mol Biol. 2024. PMID: 38819564
References
-
- Atchley WR, Wollenberg KR, Fitch WM, Terhalle W, Dress AW. Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. Mol Biol Evol. 2000;17:164–178. - PubMed
-
- Philip GK, Creevey CJ, McInerney JO. The Opisthokonta and the Ecdysozoa may not be clades: Stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the Coelomata than Ecdysozoa. Mol Biol Evol. 2005;22:1175–1184. doi: 10.1093/molbev/msi102. - DOI - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
