Multiple instance learning of Calmodulin binding sites

Bioinformatics. 2012 Sep 15;28(18):i416-i422. doi: 10.1093/bioinformatics/bts416.


Motivation: Calmodulin (CaM) is a ubiquitously conserved protein that acts as a calcium sensor, and interacts with a large number of proteins. Detection of CaM binding proteins and their interaction sites experimentally requires a significant effort, so accurate methods for their prediction are important.

Results: We present a novel algorithm (MI-1 SVM) for binding site prediction and evaluate its performance on a set of CaM-binding proteins extracted from the Calmodulin Target Database. Our approach directly models the problem of binding site prediction as a large-margin classification problem, and is able to take into account uncertainty in binding site location. We show that the proposed algorithm performs better than the standard SVM formulation, and illustrate its ability to recover known CaM binding motifs. A highly accurate cascaded classification approach using the proposed binding site prediction method to predict CaM binding proteins in Arabidopsis thaliana is also presented.

Availability: Matlab code for training MI-1 SVM and the cascaded classification approach is available on request.

Contact: or

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Arabidopsis Proteins / chemistry
  • Arabidopsis Proteins / metabolism
  • Binding Sites
  • Calmodulin / chemistry
  • Calmodulin / metabolism*
  • Calmodulin-Binding Proteins / chemistry*
  • Calmodulin-Binding Proteins / metabolism
  • Protein Interaction Domains and Motifs
  • Support Vector Machine*


  • Arabidopsis Proteins
  • Calmodulin
  • Calmodulin-Binding Proteins