LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system

BMC Bioinformatics. 2016 Jul 7;17(1):271. doi: 10.1186/s12859-016-1146-y.

Abstract

Background: A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sophisticated, are generally highly sensitive to the alignment used, and neglecting non-homologous or uncertain regions in the alignment can lead to significant bias in the subsequent inferences.

Results: Here, we present a new method, LEON-BIS, which uses a robust Bayesian framework to estimate the homologous relations between sequences in a protein multiple alignment. Sequences are clustered into sub-families and relations are predicted at different levels, including 'core blocks', 'regions' and full-length proteins. The accuracy and reliability of the predictions are demonstrated in large-scale comparisons using well annotated alignment databases, where the homologous sequence segments are detected with very high sensitivity and specificity.

Conclusions: LEON-BIS uses robust Bayesian statistics to distinguish the portions of multiple sequence alignments that are conserved either across the whole family or within subfamilies. LEON-BIS should thus be useful for automatic, high-throughput genome annotations, 2D/3D structure predictions, protein-protein interaction predictions etc.

Keywords: Bayesian statistics; Homology-based methods; Multiple sequence alignment; Sequence homology.

MeSH terms

  • Amino Acid Sequence
  • Bayes Theorem*
  • Computational Biology / methods*
  • Humans
  • Proteins / chemistry*
  • Proteins / genetics
  • Sequence Alignment / methods*
  • Sequence Homology, Amino Acid

Substances

  • Proteins