Parameter estimation in multiple-hidden i.i.d. models from biological multiple alignment

Stat Appl Genet Mol Biol. 2010:9:Article 10. doi: 10.2202/1544-6115.1510. Epub 2010 Jan 26.

Abstract

In this work we deal with parameter estimation in a latent variable model, namely the multiple-hidden i.i.d. model, which is derived from multiple alignment algorithms. We first provide a rigorous formalism for the homology structure of k sequences related by a star-shaped phylogenetic tree in the context of multiple alignment based on indel evolution models. We discuss possible definitions of likelihoods and compare them to the criterion used in multiple alignment algorithms. Existence of two different Information divergence rates is established and a divergence property is shown under additional assumptions. This would yield consistency for the parameter in parametrization schemes for which the divergence property holds. We finally extend the definition of the multiple-hidden i.i.d. model and the results obtained to the case in which the sequences are related by an arbitrary phylogenetic tree. Simulations illustrate different cases which are not covered by our results.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Biostatistics
  • Evolution, Molecular
  • INDEL Mutation
  • Likelihood Functions
  • Markov Chains
  • Models, Genetic
  • Models, Statistical*
  • Phylogeny
  • Sequence Alignment / statistics & numerical data*
  • Stochastic Processes