iPFPi: A System for Improving Protein Function Prediction through Cumulative Iterations

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):825-36. doi: 10.1109/TCBB.2014.2344681.

Abstract

We propose a classifier system called iPFPi that predicts the functions of un-annotated proteins. iPFPi assigns an un-annotated protein P the functions of GO annotation terms that are semantically similar to P. An un-annotated protein P and a GO annotation term T are represented by their characteristics. The characteristics of P are GO terms found within the abstracts of biomedical literature associated with P. The characteristics of Tare GO terms found within the abstracts of biomedical literature associated with the proteins annotated with the function of T. Let F and F/ be the important (dominant) sets of characteristic terms representing T and P, respectively. iPFPi would annotate P with the function of T, if F and F/ are semantically similar. We constructed a novel semantic similarity measure that takes into consideration several factors, such as the dominance degree of each characteristic term t in set F based on its score, which is a value that reflects the dominance status of t relative to other characteristic terms, using pairwise beats and looses procedure. Every time a protein P is annotated with the function of T, iPFPi updates and optimizes the current scores of the characteristic terms for T based on the weights of the characteristic terms for P. Set F will be updated accordingly. Thus, the accuracy of predicting the function of T as the function of subsequent proteins improves. This prediction accuracy keeps improving over time iteratively through the cumulative weights of the characteristic terms representing proteins that are successively annotated with the function of T. We evaluated the quality of iPFPi by comparing it experimentally with two recent protein function prediction systems. Results showed marked improvement.

MeSH terms

  • Computational Biology / methods*
  • Databases, Protein*
  • Molecular Sequence Annotation
  • Proteins / chemistry
  • Proteins / classification*
  • Proteins / metabolism*
  • Reproducibility of Results
  • Semantics

Substances

  • Proteins