Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 29;5:e3095.
doi: 10.7717/peerj.3095. eCollection 2017.

Automatic Single- And Multi-Label Enzymatic Function Prediction by Machine Learning

Affiliations
Free PMC article

Automatic Single- And Multi-Label Enzymatic Function Prediction by Machine Learning

Shervine Amidi et al. PeerJ. .
Free PMC article

Abstract

The number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when catalyzing chemical reactions. Until now, single-label classification has been widely performed for predicting enzymatic function limiting the application to enzymes performing unique reactions and introducing errors when multi-functional enzymes are examined. Indeed, some enzymes may be performing different reactions and can hence be directly associated with multiple enzymatic functions. In the present work, we propose a multi-label enzymatic function classification scheme that combines structural and amino acid sequence information. We investigate two fusion approaches (in the feature level and decision level) and assess the methodology for general enzymatic function prediction indicated by the first digit of the enzyme commission (EC) code (six main classes) on 40,034 enzymes from the PDB database. The proposed single-label and multi-label models predict correctly the actual functional activities in 97.8% and 95.5% (based on Hamming-loss) of the cases, respectively. Also the multi-label model predicts all possible enzymatic reactions in 85.4% of the multi-labeled enzymes when the number of reactions is unknown. Code and datasets are available at https://figshare.com/s/a63e0bafa9b71fc7cbd7.

Keywords: Amino acid sequence; Enzyme classification; Multi-label; Single-label; Smith-Waterman algorithm; Structural information.

Conflict of interest statement

Nikos Paragios and Evangelia Zacharaki are employees of Equipe GALEN, INRIA Saclay, France.

Figures

Figure 1
Figure 1. Overview of feature-level fusion.
Figure 2
Figure 2. Decision-level fusion for single- and multi-label classification.
Figure 3
Figure 3. Testing subset accuracy for dataset II.
Figure 4
Figure 4. Repartition of correctly predicted enzymes with respect to subset accuracy.
Figure 5
Figure 5. Testing 1-Hamming-loss for dataset II.

Similar articles

See all similar articles

Cited by 3 articles

References

    1. Amidi A, Amidi S, Vlachakis D, Paragios N, Zacharaki EI. A machine learning methodology for enzyme functional classification combining structural and protein sequence descriptors. Lecture Notes in Computer Science. 2016;9656:728–738.
    1. Atiya A. Estimating the posterior probabilities using the k-nearest neighbor rule. Neural Computation. 2005;17(3):731–740. doi: 10.1162/0899766053019971. - DOI - PubMed
    1. Borgwardt KM, Ong CS, Schönauer S, Vishwanathan SVN, Smola AJ, Kriegel H-P. Protein function prediction via graph kernels. Bioinformatics. 2005;21(Suppl 1):i47–i56. - PubMed
    1. Concu R, Dea-Ayuela M, Perez-Montoto L, Bolas-Fernandez F, Prado-Prado F, Podda G, Uriarte E, Ubeira F, Gonzalez-Diaz H. Prediction of enzyme classes from 3D structure: a general model and examples of experimental-theoretic scoring of peptide mass fingerprints of leishmania proteins. Journal of Proteome Research. 2009a;8(9):4372–4382. doi: 10.1021/pr9003163. - DOI - PubMed
    1. Concu R, Dea-Ayuela M, Perez-Montoto L, Uriarte F, Bolas-Fernandez F, Podda G, Pazos A, Munteanu C, Ubeira F, Gonzalez-Diaz H. 3D entropy and moments prediction of enzyme classes and experimental-theoretic study of peptide fingerprints in leishmania parasites. Biochimica et Biophysica Acta (BBA)–Proteins and Proteomics. 2009b;1794(12):1784–1794. doi: 10.1016/j.bbapap.2009.08.020. - DOI - PubMed

Grant support

This research was partially supported by European Research Council Grant Diocles (ERC-STG-259112). There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

Feedback