Kalign 3: multiple sequence alignment of large data sets
- PMID: 31665271
- PMCID: PMC7703769
- DOI: 10.1093/bioinformatics/btz795
Kalign 3: multiple sequence alignment of large data sets
Abstract
Motivation: Kalign is an efficient multiple sequence alignment (MSA) program capable of aligning thousands of protein or nucleotide sequences. However, current alignment problems involving large numbers of sequences are exceeding Kalign's original design specifications. Here we present a completely re-written and updated version to meet current and future alignment challenges.
Results: Kalign now uses a SIMD accelerated version of the bit-parallel Gene Myers algorithm to estimate pariwise distances, adopts a sequence embedding strategy and the bi-secting K-means algorithm to rapidly construct guide trees for thousands of sequences. The new version maintains high alignment accuracy on both protein and nucleotide alignments and scales better than other MSA tools.
Availability: The source code of Kalign and code to reproduce the results are found here: https://github.com/timolassmann/kalign.
© The Author(s) 2019. Published by Oxford University Press.
Figures
Similar articles
-
Kalign--an accurate and fast multiple sequence alignment algorithm.BMC Bioinformatics. 2005 Dec 12;6:298. doi: 10.1186/1471-2105-6-298. BMC Bioinformatics. 2005. PMID: 16343337 Free PMC article.
-
Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features.Nucleic Acids Res. 2009 Feb;37(3):858-65. doi: 10.1093/nar/gkn1006. Epub 2008 Dec 22. Nucleic Acids Res. 2009. PMID: 19103665 Free PMC article.
-
Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignment.Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W596-9. doi: 10.1093/nar/gkl191. Nucleic Acids Res. 2006. PMID: 16845078 Free PMC article.
-
Evaluating the accuracy and efficiency of multiple sequence alignment methods.Evol Bioinform Online. 2014 Dec 7;10:205-17. doi: 10.4137/EBO.S19199. eCollection 2014. Evol Bioinform Online. 2014. PMID: 25574120 Free PMC article.
-
Grammar-based distance in progressive multiple sequence alignment.BMC Bioinformatics. 2008 Jul 10;9:306. doi: 10.1186/1471-2105-9-306. BMC Bioinformatics. 2008. PMID: 18616828 Free PMC article.
Cited by
-
Full resolution HLA and KIR genes annotation for human genome assemblies.bioRxiv [Preprint]. 2024 Jan 23:2024.01.20.576452. doi: 10.1101/2024.01.20.576452. bioRxiv. 2024. Update in: Genome Res. 2024 Nov 20;34(11):1931-1941. doi: 10.1101/gr.278985.124 PMID: 38328160 Free PMC article. Updated. Preprint.
-
Targeted long-read sequencing facilitates phased diploid assembly and genotyping of the human T cell receptor alpha, delta, and beta loci.Cell Genom. 2022 Nov 30;2(12):100228. doi: 10.1016/j.xgen.2022.100228. eCollection 2022 Dec 14. Cell Genom. 2022. PMID: 36778049 Free PMC article.
-
Application of Computational Techniques in Antibody Fc-Fused Molecule Design for Therapeutics.Mol Biotechnol. 2024 Apr;66(4):568-581. doi: 10.1007/s12033-023-00885-x. Epub 2023 Sep 24. Mol Biotechnol. 2024. PMID: 37742298 Review.
-
Developing Bioprospecting Strategies for Bioplastics Through the Large-Scale Mining of Microbial Genomes.Front Microbiol. 2021 Jul 12;12:697309. doi: 10.3389/fmicb.2021.697309. eCollection 2021. Front Microbiol. 2021. PMID: 34322108 Free PMC article.
-
Niche-Aware Metagenomic Screening for Enzyme Methioninase Illuminates Its Contribution to Metabolic Syntrophy.Microb Ecol. 2024 Nov 15;87(1):141. doi: 10.1007/s00248-024-02458-0. Microb Ecol. 2024. PMID: 39546027 Free PMC article.
References
-
- Katoh K., Toh H. (2007) Parttree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics, 23, 372–374. - PubMed
LinkOut - more resources
Full Text Sources
