Self-Organizing Hidden Markov Model Map (SOHMMM): Biological Sequence Clustering and Cluster Visualization

Methods Mol Biol. 2017:1552:83-101. doi: 10.1007/978-1-4939-6753-7_6.

Abstract

The present study devises mapping methodologies and projection techniques that visualize and demonstrate biological sequence data clustering results. The Sequence Data Density Display (SDDD) and Sequence Likelihood Projection (SLP) visualizations represent the input symbolical sequences in a lower-dimensional space in such a way that the clusters and relations of data elements are depicted graphically. Both operate in combination/synergy with the Self-Organizing Hidden Markov Model Map (SOHMMM). The resulting unified framework is in position to analyze automatically and directly raw sequence data. This analysis is carried out with little, or even complete absence of, prior information/domain knowledge.

Keywords: Biological chain molecule; Clustering; DNA/RNA/protein sequence data; Hidden Markov model (HMM); Mapping; Nonlinear projection; Self-organizing map (SOM); Visualization.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Computational Biology / methods*
  • Computer Simulation
  • Databases, Protein
  • Humans
  • Markov Chains*
  • Models, Molecular
  • Models, Statistical*
  • Neural Networks, Computer
  • Protein Conformation
  • Proteins / chemistry*

Substances

  • Proteins