Multiple alignment by aligning alignments
- PMID: 17646343
- DOI: 10.1093/bioinformatics/btm226
Multiple alignment by aligning alignments
Abstract
Motivation: Multiple sequence alignment is a fundamental task in bioinformatics. Current tools typically form an initial alignment by merging subalignments, and then polish this alignment by repeated splitting and merging of subalignments to obtain an improved final alignment. In general this form-and-polish strategy consists of several stages, and a profusion of methods have been tried at every stage. We carefully investigate: (1) how to utilize a new algorithm for aligning alignments that optimally solves the common subproblem of merging subalignments, and (2) what is the best choice of method for each stage to obtain the highest quality alignment.
Results: We study six stages in the form-and-polish strategy for multiple alignment: parameter choice, distance estimation, merge-tree construction, sequence-pair weighting, alignment merging, and polishing. For each stage, we consider novel approaches as well as standard ones. Interestingly, the greatest gains in alignment quality come from (i) estimating distances by a new approach using normalized alignment costs, and (ii) polishing by a new approach using 3-cuts. Experiments with a parameter-value oracle suggest large gains in quality may be possible through an input-dependent choice of alignment parameters, and we present a promising approach for building such an oracle. Combining the best approaches to each stage yields a new tool we call Opal that on benchmark alignments matches the quality of the top tools, without employing alignment consistency or hydrophobic gap penalties.
Availability: Opal, a multiple alignment tool that implements the best methods in our study, is freely available at http://opal.cs.arizona.edu.
Similar articles
-
Multiple alignment by sequence annealing.Bioinformatics. 2007 Jan 15;23(2):e24-9. doi: 10.1093/bioinformatics/btl311. Bioinformatics. 2007. PMID: 17237099
-
PROMALS: towards accurate multiple sequence alignments of distantly related proteins.Bioinformatics. 2007 Apr 1;23(7):802-8. doi: 10.1093/bioinformatics/btm017. Epub 2007 Jan 31. Bioinformatics. 2007. PMID: 17267437
-
An iterative refinement algorithm for consistency based multiple structural alignment methods.Bioinformatics. 2006 Sep 1;22(17):2087-93. doi: 10.1093/bioinformatics/btl351. Epub 2006 Jun 29. Bioinformatics. 2006. PMID: 16809393
-
[Recent progress in multiple sequence alignment].Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2010 Aug;27(4):924-8. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2010. PMID: 20842873 Review. Chinese.
-
Sequence alignment and penalty choice. Review of concepts, case studies and implications.J Mol Biol. 1994 Jan 7;235(1):1-12. doi: 10.1016/s0022-2836(05)80006-3. J Mol Biol. 1994. PMID: 8289235 Review.
Cited by
-
Temporal and Spatial Variation of Soil Bacteria Richness, Composition, and Function in a Neotropical Rainforest.PLoS One. 2016 Jul 8;11(7):e0159131. doi: 10.1371/journal.pone.0159131. eCollection 2016. PLoS One. 2016. PMID: 27391450 Free PMC article.
-
Monophyly of terrestrial adephagan beetles as indicated by three nuclear genes (Coleoptera: Carabidae and Trachypachidae).Zool Scr. 2009;38(1):43-62. doi: 10.1111/j.1463-6409.2008.00359.x. Zool Scr. 2009. PMID: 19789725 Free PMC article.
-
MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.PLoS One. 2011;6(9):e22594. doi: 10.1371/journal.pone.0022594. Epub 2011 Sep 16. PLoS One. 2011. PMID: 21949676 Free PMC article.
-
Reconstructing the Complex Evolutionary History of the Papuasian Schefflera Radiation Through Herbariomics.Front Plant Sci. 2020 Mar 20;11:258. doi: 10.3389/fpls.2020.00258. eCollection 2020. Front Plant Sci. 2020. PMID: 32265950 Free PMC article.
-
Evolutionary origin of a streamlined marine bacterioplankton lineage.ISME J. 2015 Jun;9(6):1423-33. doi: 10.1038/ismej.2014.227. Epub 2014 Nov 28. ISME J. 2015. PMID: 25431989 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous
