Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr;30(4):772-80.
doi: 10.1093/molbev/mst010. Epub 2013 Jan 16.

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

Affiliations
Free PMC article

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

Kazutaka Katoh et al. Mol Biol Evol. .
Free PMC article

Abstract

We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.

Figures

<b>F<sc>ig</sc>. 1</b>.
Fig. 1.
Assumptions on the phylogenetic relationship in different options of MAFFT. (A) mafft-profile, (B) ––addprofile, (C), misuse of mafft-profile, and (D) ––add or ––addprofile.
F<sc>ig</sc>. 2.
Fig. 2.
ITS alignments by different options of MAFFT, displayed on Jalview (Waterhouse et al. 2009). (A, B) Incorrect alignments by the FFT-NS-2 and L-INS-i algorithms, respectively. (C) An incorrect alignment by mafft-profile. The full-length sequences were aligned with the L-INS-i algorithm and then each new sequence was separately added to the full-length alignment, using mafft-profile. (D) Reasonable alignment by a two-step strategy. The ––6merpair ––addfragments option was used at the second step. (E) Reordered version of D; sequences are ordered such that similar sequences are placed closely. All calculations were performed using 16 cores on a Linux PC with 2.67 GHz Intel Xeon E7-8837/256 GB RAM.
F<sc>ig</sc>. 3.
Fig. 3.
(A) A part of output of the ––treeout option showing the phylogenetic positions of new sequences (new#) in the tree of the existing alignment (backbone#), estimated before the alignment calculation. This file also shows a Newick format tree of the existing alignment (not shown in this figure). For each new sequence, the nearest sequence in the existing alignment (nearest sequence), approximate distance to the nearest sequence (approximate distance), and the members of the sister group (sister group) are shown. (B) Graphical representation of (A).
<b>F<sc>ig</sc>. 4</b>.
Fig. 4.
(A) Superposition of 3v33, 2qip, and 1taq structures visualized by PyMOL (Schrödinger LLC 2010). (B) MAFFT-L-INS-i sequence alignment displayed on jalview (Waterhouse et al. 2009). Misaligned Ds are highlighed in red. (C) Structure-informed MSA with correctly aligned Ds; Alpha helices and beta sheets are shown in blue and yellow, respectively, in (A–C).

Similar articles

See all similar articles

Cited by 4,979 articles

See all "Cited by" articles

References

    1. Altschul SF. Generalized affine gap costs for protein sequence alignment. Proteins. 1998;32:88–96. - PubMed
    1. Barton GJ, Sternberg MJ. A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J Mol Biol. 1987;198:327–337. - PubMed
    1. Berger MP, Munson PJ. A novel randomized iterative strategy for aligning multiple protein sequences. Comput Appl Biosci. 1991;7:479–484. - PubMed
    1. Berger SA, Stamatakis A. Aligning short reads to reference alignments and trees. Bioinformatics. 2011;27:2068–2075. - PubMed
    1. Blackburne BP, Whelan S. Class of multiple sequence alignment algorithm affects genomic analysis. Mol Biol Evol. 2012a Advance access published December 4, 2012, doi:10.1093/molbev/mss256. - PubMed

Publication types

LinkOut - more resources

Feedback