Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 28 (18), i416-i422

Multiple Instance Learning of Calmodulin Binding Sites


Multiple Instance Learning of Calmodulin Binding Sites

Fayyaz ul Amir Afsar Minhas et al. Bioinformatics.


Motivation: Calmodulin (CaM) is a ubiquitously conserved protein that acts as a calcium sensor, and interacts with a large number of proteins. Detection of CaM binding proteins and their interaction sites experimentally requires a significant effort, so accurate methods for their prediction are important.

Results: We present a novel algorithm (MI-1 SVM) for binding site prediction and evaluate its performance on a set of CaM-binding proteins extracted from the Calmodulin Target Database. Our approach directly models the problem of binding site prediction as a large-margin classification problem, and is able to take into account uncertainty in binding site location. We show that the proposed algorithm performs better than the standard SVM formulation, and illustrate its ability to recover known CaM binding motifs. A highly accurate cascaded classification approach using the proposed binding site prediction method to predict CaM binding proteins in Arabidopsis thaliana is also presented.

Availability: Matlab code for training MI-1 SVM and the cascaded classification approach is available on request.

Contact: or


Fig. 1.
Fig. 1.
CaM binding site prediction with MIL. The annotated binding site is shown as a box, and is represented by a ‘bag’ composed of the windows indicated in red above the binding site. The rest of the windows that do not overlap the binding site are negative examples (shown in blue below the protein). The bottom panel illustrates the desired characteristics of the classifier's discriminant function. The dots indicate the score of different examples (positive indicated by solid red circles and negative shown as hollowed blue circles). The score from the trained discriminant function for one window in a binding site should be higher than the scores generated for non-binding site windows within that protein
Fig. 2.
Fig. 2.
MI-1 discriminant values along the length of a held-out protein with the position-independent (top) and the position-dependent (bottom) 1-spectrum features
Fig. 3.
Fig. 3.
(a) Weights of different amino acids in the (position-independent) 1-spectrum feature representation; (b) Heat map of the weights of different amino acids versus their position from the MI-1 SVM position-dependent 1-spectrum feature representation; and (c) Top 100 (in terms of their weights) motifs from the position-dependent gappy triplet kernel. The last (numeric) column shows actual weight values

Similar articles

See all similar articles

Cited by 7 PubMed Central articles

See all "Cited by" articles


    1. Altschul S., et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
    1. Andrews S., et al. Support vector machines for multiple-instance learning. Adv. Neur. Inf. Process. Syst. 2003;15:561–568.
    1. Babenko B., et al. Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 2011;33:1619–1632. - PubMed
    1. Ben-Hur A. PyML - machine learning in Python. 2011. Available at
    1. Bouche N., et al. Plant-specific Calmodulin-binding proteins. Annu. Rev. Plant Biol. 2005;56:435–466. - PubMed

Publication types

MeSH terms