Background: Metalloproteins are proteins capable of binding one or more metal ions, which may be required for their biological function, for regulation of their activities or for structural purposes. Metal-binding properties remain difficult to predict as well as to investigate experimentally at the whole-proteome level. Consequently, the current knowledge about metalloproteins is only partial.
Results: The present work reports on the development of a machine learning method for the prediction of the zinc-binding state of pairs of nearby amino-acids, using predictors based on support vector machines. The predictor was trained using chains containing zinc-binding sites and non-metalloproteins in order to provide positive and negative examples. Results based on strong non-redundancy tests prove that (1) zinc-binding residues can be predicted and (2) modelling the correlation between the binding state of nearby residues significantly improves performance. The trained predictor was then applied to the human proteome. The present results were in good agreement with the outcomes of previous, highly manually curated, efforts for the identification of human zinc-binding proteins. Some unprecedented zinc-binding sites could be identified, and were further validated through structural modelling. The software implementing the predictor is freely available at: http://zincfinder.dsi.unifi.it
Conclusion: The proposed approach constitutes a highly automated tool for the identification of metalloproteins, which provides results of comparable quality with respect to highly manually refined predictions. The ability to model correlations between pairwise residues allows it to obtain a significant improvement over standard 1D based approaches. In addition, the method permits the identification of unprecedented metal sites, providing important hints for the work of experimentalists.