Statistical alignment based on fragment insertion and deletion models

Bioinformatics. 2003 Mar 1;19(4):490-9. doi: 10.1093/bioinformatics/btg026.

Abstract

Motivation: The topic of this paper is the estimation of alignments and mutation rates based on stochastic sequence-evolution models that allow insertions and deletions of subsequences ('fragments') and not just single bases. The model we propose is a variant of a model introduced by Thorne et al., (J. Mol. Evol., 34, 3-16, 1992). The computational tractability of the model depends on certain restrictions in the insertion/deletion process; possible effects we discuss.

Results: The process of fragment insertion and deletion in the sequence-evolution model induces a hidden Markov structure at the level of alignments and thus makes possible efficient statistical alignment algorithms. As an example we apply a sampling procedure to assess the variability in alignment and mutation parameter estimates for HVR1 sequences of human and orangutan, improving results of previous work. Simulation studies give evidence that estimation methods based on the proposed model also give satisfactory results when applied to data for which the restrictions in the insertion/deletion process do not hold.

Availability: The source code of the software for sampling alignments and mutation rates for a pair of DNA sequences according to the fragment insertion and deletion model is freely available from http://www.math.uni-frankfurt.de/~stoch/software/mcmcsalut under the terms of the GNU public license (GPL, 2000).

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Animals
  • Base Sequence
  • Computer Simulation
  • DNA Mutational Analysis / methods*
  • DNA Transposable Elements / genetics
  • Evolution, Molecular
  • Gene Deletion
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation / genetics
  • Humans
  • Models, Genetic*
  • Models, Statistical
  • Molecular Sequence Data
  • Oligodeoxyribonucleotides / genetics
  • Pongo pygmaeus
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Stochastic Processes
  • Viral Proteins / genetics

Substances

  • DNA Transposable Elements
  • HVR1 protein, Hepatitis C virus
  • Oligodeoxyribonucleotides
  • Viral Proteins