Literature mining and database annotation of protein phosphorylation using a rule-based system

Bioinformatics. 2005 Jun 1;21(11):2759-65. doi: 10.1093/bioinformatics/bti390. Epub 2005 Apr 6.

Abstract

Motivation: A large volume of experimental data on protein phosphorylation is buried in the fast-growing PubMed literature. While of great value, such information is limited in databases owing to the laborious process of literature-based curation. Computational literature mining holds promise to facilitate database curation.

Results: A rule-based system, RLIMS-P (Rule-based LIterature Mining System for Protein Phosphorylation), was used to extract protein phosphorylation information from MEDLINE abstracts. An annotation-tagged literature corpus developed at PIR was used to evaluate the system for finding phosphorylation papers and extracting phosphorylation objects (kinases, substrates and sites) from abstracts. RLIMS-P achieved a precision and recall of 91.4 and 96.4% for paper retrieval, and of 97.9 and 88.0% for extraction of substrates and sites. Coupling the high recall for paper retrieval and high precision for information extraction, RLIMS-P facilitates literature mining and database annotation of protein phosphorylation.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Abstracting and Indexing / methods
  • Algorithms*
  • Artificial Intelligence*
  • Database Management Systems
  • Information Storage and Retrieval / methods*
  • MEDLINE*
  • Natural Language Processing*
  • Periodicals as Topic
  • Phosphorylation*
  • Proteins / classification*
  • Semantics
  • Vocabulary, Controlled

Substances

  • Proteins