Machine Learnable Fold Space Representation based on Residue Cluster Classes

Ricardo Corral-Corral; Edgar Chavez; Gabriel Del Rio

doi:10.1016/j.compbiolchem.2015.07.010

Machine Learnable Fold Space Representation based on Residue Cluster Classes

Comput Biol Chem. 2015 Dec:59 Pt A:1-7. doi: 10.1016/j.compbiolchem.2015.07.010. Epub 2015 Jul 30.

Authors

Ricardo Corral-Corral¹, Edgar Chavez², Gabriel Del Rio³

Affiliations

¹ Department of Biochemistry and Structural Biology, Instituto de Fisiologa Celular, Universidad Nacional Autónoma de México, México D. F., México.
² Centro de Investigación Científica y de Educación Superior de Ensenada, México.
³ Department of Biochemistry and Structural Biology, Instituto de Fisiologa Celular, Universidad Nacional Autónoma de México, México D. F., México. Electronic address: gdelrio@ifc.unam.mx.

PMID: 26366526
DOI: 10.1016/j.compbiolchem.2015.07.010

Abstract

Motivation: Protein fold space is a conceptual framework where all possible protein folds exist and ideas about protein structure, function and evolution may be analyzed. Classification of protein folds in this space is commonly achieved by using similarity indexes and/or machine learning approaches, each with different limitations.

Results: We propose a method for constructing a compact vector space model of protein fold space by representing each protein structure by its residues local contacts. We developed an efficient method to statistically test for the separability of points in a space and showed that our protein fold space representation is learnable by any machine-learning algorithm.

Availability: An API is freely available at https://code.google.com/p/pyrcc/.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cluster Analysis
Machine Learning*
Protein Folding*
Proteins / chemistry*

Substances

Proteins