Motifs tree: a new method for predicting post-translational modifications

Bioinformatics. 2014 Jul 15;30(14):1974-82. doi: 10.1093/bioinformatics/btu165. Epub 2014 Mar 28.

Abstract

Motivation: Post-translational modifications (PTMs) are important steps in the maturation of proteins. Several models exist to predict specific PTMs, from manually detected patterns to machine learning methods. On one hand, the manual detection of patterns does not provide the most efficient classifiers and requires an important workload, and on the other hand, models built by machine learning methods are hard to interpret and do not increase biological knowledge. Therefore, we developed a novel method based on patterns discovery and decision trees to predict PTMs. The proposed algorithm builds a decision tree, by coupling the C4.5 algorithm with genetic algorithms, producing high-performance white box classifiers. Our method was tested on the initiator methionine cleavage (IMC) and N(α)-terminal acetylation (N-Ac), two of the most common PTMs.

Results: The resulting classifiers perform well when compared with existing models. On a set of eukaryotic proteins, they display a cross-validated Matthews correlation coefficient of 0.83 (IMC) and 0.65 (N-Ac). When used to predict potential substrates of N-terminal acetyltransferaseB and N-terminal acetyltransferaseC, our classifiers display better performance than the state of the art. Moreover, we present an analysis of the model predicting IMC for Homo sapiens proteins and demonstrate that we are able to extract experimentally known facts without prior knowledge. Those results validate the fact that our method produces white box models.

Availability and implementation: Predictors for IMC and N-Ac and all datasets are freely available at http://terminus.unige.ch/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Acetylation
  • Acetyltransferases / metabolism
  • Algorithms
  • Amino Acid Motifs
  • Artificial Intelligence
  • Humans
  • Methionine / metabolism
  • Protein Processing, Post-Translational*
  • Proteins / metabolism
  • Sequence Analysis, Protein / methods*
  • Software

Substances

  • Proteins
  • Methionine
  • Acetyltransferases