Empirical profile mixture models for phylogenetic reconstruction

Le Si Quang; Olivier Gascuel; Nicolas Lartillot

doi:10.1093/bioinformatics/btn445

Empirical profile mixture models for phylogenetic reconstruction

Bioinformatics. 2008 Oct 15;24(20):2317-23. doi: 10.1093/bioinformatics/btn445. Epub 2008 Aug 21.

Authors

Le Si Quang¹, Olivier Gascuel, Nicolas Lartillot

Affiliation

¹ Méthodes et Algorithmes pour la Bioinformatique, LIRMM, CNRS-UM2, Montpellier Cedex 5, France.

PMID: 18718941
DOI: 10.1093/bioinformatics/btn445

Abstract

Motivation: Previous studies have shown that accounting for site-specific amino acid replacement patterns using mixtures of stationary probability profiles offers a promising approach for improving the robustness of phylogenetic reconstructions in the presence of saturation. However, such profile mixture models were introduced only in a Bayesian context, and are not yet available in a maximum likelihood (ML) framework. In addition, these mixture models only perform well on large alignments, from which they can reliably learn the shapes of profiles, and their associated weights.

Results: In this work, we introduce an expectation-maximization algorithm for estimating amino acid profile mixtures from alignment databases. We apply it, learning on the HSSP database, and observe that a set of 20 profiles is enough to provide a better statistical fit than currently available empirical matrices (WAG, JTT), in particular on saturated data.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Amino Acid Sequence
Amino Acid Substitution*
Animals
Bayes Theorem
Computational Biology / methods*
Databases, Protein
Humans
Likelihood Functions
Phylogeny*
Sequence Alignment
Sequence Analysis, Protein