Noisy: identification of problematic columns in multiple sequence alignments
- PMID: 18577231
- PMCID: PMC2464588
- DOI: 10.1186/1748-7188-3-7
Noisy: identification of problematic columns in multiple sequence alignments
Abstract
Motivation: Sequence-based methods for phylogenetic reconstruction from (nucleic acid) sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i) phylogenetically informative and (ii) effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction.
Results: We present here a method that, based on assessing the distribution of character states along a cyclic ordering of the taxa, allows the identification of phylogenetically uninformative homoplastic sites in a multiple sequence alignment. Removal of these sites appears to improve the performance of phylogenetic reconstruction algorithms as measured by various indices of "tree quality". In particular, we obtain more stable trees due to the exclusion of phylogenetically incompatible sites that most likely represent strongly randomized characters.
Software: The computer program noisy implements this approach. It can be employed to improving phylogenetic reconstruction capability with quite a considerable success rate whenever (1) the average bootstrap support obtained from the original alignment is low, and (2) there are sufficiently many taxa in the data set - at least, say, 12 to 15 taxa. The software can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/noisy/.
Figures
Similar articles
-
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1. Syst Biol. 2012. PMID: 22139466
-
A method of alignment masking for refining the phylogenetic signal of multiple sequence alignments.Mol Biol Evol. 2013 Mar;30(3):689-712. doi: 10.1093/molbev/mss264. Epub 2012 Nov 27. Mol Biol Evol. 2013. PMID: 23193120
-
NcDNAlign: plausible multiple alignments of non-protein-coding genomic sequences.Genomics. 2008 Jul;92(1):65-74. doi: 10.1016/j.ygeno.2008.04.003. Epub 2008 Jun 3. Genomics. 2008. PMID: 18511233
-
Bayesian coestimation of phylogeny and sequence alignment.BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83. BMC Bioinformatics. 2005. PMID: 15804354 Free PMC article.
-
Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.Syst Biol. 2007 Aug;56(4):564-77. doi: 10.1080/10635150701472164. Syst Biol. 2007. PMID: 17654362
Cited by
-
Variation in temperature of peak trait performance constrains adaptation of arthropod populations to climatic warming.Nat Ecol Evol. 2024 Mar;8(3):500-510. doi: 10.1038/s41559-023-02301-8. Epub 2024 Jan 25. Nat Ecol Evol. 2024. PMID: 38273123 Free PMC article.
-
Evolution of a pathogen: a comparative genomics analysis identifies a genetic pathway to pathogenesis in Acinetobacter.PLoS One. 2013;8(1):e54287. doi: 10.1371/journal.pone.0054287. Epub 2013 Jan 24. PLoS One. 2013. PMID: 23365658 Free PMC article.
-
Pyridoxal 5'-phosphate synthesis and salvage in Bacteria and Archaea: predicting pathway variant distributions and holes.Microb Genom. 2023 Feb;9(2):mgen000926. doi: 10.1099/mgen.0.000926. Microb Genom. 2023. PMID: 36729913 Free PMC article.
-
Expanded diversity of microbial groups that shape the dissimilatory sulfur cycle.ISME J. 2018 Jun;12(7):1715-1728. doi: 10.1038/s41396-018-0078-0. Epub 2018 Feb 21. ISME J. 2018. PMID: 29467397 Free PMC article.
-
Improved phylogenetic analyses corroborate a plausible position of Martialis heureka in the ant tree of life.PLoS One. 2011;6(6):e21031. doi: 10.1371/journal.pone.0021031. Epub 2011 Jun 24. PLoS One. 2011. PMID: 21731644 Free PMC article.
References
-
- Björklund M. Are Third Positions Really That Bad? A Test Using Vertebrate Cytochrome b. Cladistics. 1999;15:91–97. - PubMed
-
- Wägele JW. Foundations of Phylogenetic Systematics. Munich, Germany: Verlag Dr Friedrich Pfeil; 2005.
LinkOut - more resources
Full Text Sources
Research Materials
