Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006;34(16):4364-74.
doi: 10.1093/nar/gkl514. Epub 2006 Aug 26.

MUMMALS: Multiple Sequence Alignment Improved by Using Hidden Markov Models With Local Structural Information

Free PMC article

MUMMALS: Multiple Sequence Alignment Improved by Using Hidden Markov Models With Local Structural Information

Jimin Pei et al. Nucleic Acids Res. .
Free PMC article

Erratum in

  • Nucleic Acids Res. 2006;34(20):6064


We have developed MUMMALS, a program to construct multiple protein sequence alignment using probabilistic consistency. MUMMALS improves alignment quality by using pairwise alignment hidden Markov models (HMMs) with multiple match states that describe local structural information without exploiting explicit structure predictions. Parameters for such models have been estimated from a large library of structure-based alignments. We show that (i) on remote homologs, MUMMALS achieves statistically best accuracy among several leading aligners, such as ProbCons, MAFFT and MUSCLE, albeit the average improvement is small, in the order of several percent; (ii) a large collection (>10 000) of automatically computed pairwise structure alignments of divergent protein domains is superior to smaller but carefully curated datasets for estimation of alignment parameters and performance tests; (iii) reference-independent evaluation of alignment quality using sequence alignment-dependent structure superpositions correlates well with reference-dependent evaluation that compares sequence-based alignments to structure-based reference alignments.


Figure 1
Figure 1
(a) An illustration of structure-based sequence alignment and hidden state paths. In Sequences 1 and 2, uppercase letters and lowercase letters represent aligned core blocks and unaligned regions, respectively. If two corresponding unaligned regions bounded by the same two core blocks are of different length, we split the shorter one into two pieces and introduce contiguous gaps in the middle. For both N- and C-terminal ends, the shorter unaligned region is pushed toward the core blocks. Secondary structure (ss) types (helix, ‘h’; strand, ‘e’; coil, ‘c’) are shown for Sequence 1. The hidden state paths for three models are shown below the amino acid sequences. (b) Model structure of HMM_1_1_0. Residue pairs in unaligned regions are modeled using the same match state (‘M’) as those in the aligned blocks. Insertions in the first sequence and second sequence are modeled using states ‘X’ and ‘Y’, respectively. (c) Model structure of HMM_1_1_1. Residue pairs in the unaligned regions are modeled using a different match state (‘U’) than the match state in the core blocks (‘M’). (d) Model structure of HMM_1_3_1. Residue pairs in aligned core blocks are modeled using three match states (‘H’, ‘S’, ‘C’) according to three secondary structure types of the first sequence. In (b), (c) and (d), match states are shown as squares and insertion states are shown as diamonds. Begin state, end state, and transitions from or to them are present in these models, but are not shown.

Similar articles

See all similar articles

Cited by 31 articles

See all "Cited by" articles


    1. Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Eddy S.R. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. - PubMed
    1. Lichtarge O., Bourne H.R., Cohen F.E. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 1996;257:342–358. - PubMed
    1. Jones S., Thornton J.M. Searching for functional sites in protein structures. Curr. Opin. Chem. Biol. 2004;8:3–7. - PubMed
    1. Wallace I.M., Blackshields G., Higgins D.G. Multiple sequence alignments. Curr. Opin. Struct. Biol. 2005;15:261–266. - PubMed

Publication types