KAPPA, a simple algorithm for discovery and clustering of proteins defined by a key amino acid pattern: a case study of the cysteine-rich proteins

Bioinformatics. 2015 Jun 1;31(11):1716-23. doi: 10.1093/bioinformatics/btv047. Epub 2015 Jan 31.


Motivation: Proteins defined by a key amino acid pattern are key players in the exchange of signals between bacteria, animals and plants, as well as important mediators for cell-cell communication within a single organism. Their description and characterization open the way to a better knowledge of molecular signalling in a broad range of organisms, and to possible application in medical and agricultural research. The contrasted pattern of evolution in these proteins makes it difficult to detect and cluster them with classical sequence-based search tools. Here, we introduce Key Aminoacid Pattern-based Protein Analyzer (KAPPA), a new multi-platform program to detect them in a given set of proteins, analyze their pattern and cluster them by comparison to reference patterns (ab initio search) or internal pairwise comparison (de novo search).

Results: In this study, we use the concrete example of cysteine-rich proteins (CRPs) to show that the similarity of two cysteine patterns can be precisely and efficiently assessed by a quantitative tool created for KAPPA: the κ-score. We also demonstrate the clear advantage of KAPPA over other classical sequence search tools for ab initio search of new CRPs. Eventually, we present de novo clustering and subclustering functionalities that allow to rapidly generate consistent groups of CRPs without a seed reference.

Availability and implementation: KAPPA executables are available for Linux, Windows and Mac OS at http://kappa-sequence-search.sourceforge.net.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acids
  • Animals
  • Cluster Analysis
  • Cysteine / analysis*
  • Humans
  • Plant Proteins / chemistry
  • Proteins / chemistry
  • Sequence Analysis, Protein / methods*


  • Amino Acids
  • Plant Proteins
  • Proteins
  • Cysteine