Simultaneous alignment and clustering of peptide data using a Gibbs sampling approach

Bioinformatics. 2013 Jan 1;29(1):8-14. doi: 10.1093/bioinformatics/bts621. Epub 2012 Oct 24.


Motivation: Proteins recognizing short peptide fragments play a central role in cellular signaling. As a result of high-throughput technologies, peptide-binding protein specificities can be studied using large peptide libraries at dramatically lower cost and time. Interpretation of such large peptide datasets, however, is a complex task, especially when the data contain multiple receptor binding motifs, and/or the motifs are found at different locations within distinct peptides.

Results: The algorithm presented in this article, based on Gibbs sampling, identifies multiple specificities in peptide data by performing two essential tasks simultaneously: alignment and clustering of peptide data. We apply the method to de-convolute binding motifs in a panel of peptide datasets with different degrees of complexity spanning from the simplest case of pre-aligned fixed-length peptides to cases of unaligned peptide datasets of variable length. Example applications described in this article include mixtures of binders to different MHC class I and class II alleles, distinct classes of ligands for SH3 domains and sub-specificities of the HLA-A*02:01 molecule.

Availability: The Gibbs clustering method is available online as a web server at

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Alleles
  • Cluster Analysis
  • Genes, MHC Class I
  • Genes, MHC Class II
  • HLA Antigens / chemistry
  • HLA Antigens / genetics
  • HLA Antigens / metabolism
  • Humans
  • Ligands
  • Peptides / chemistry*
  • Peptides / metabolism
  • Position-Specific Scoring Matrices
  • Protein Binding
  • Protein Interaction Domains and Motifs*
  • Sequence Alignment*
  • Sequence Analysis, Protein*
  • src Homology Domains


  • HLA Antigens
  • Ligands
  • Peptides