Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear all
. 2013 Mar 22;14:104.
doi: 10.1186/1471-2105-14-104.

Application of Text-Mining for Updating Protein Post-Translational Modification Annotation in UniProtKB

Free PMC article

Application of Text-Mining for Updating Protein Post-Translational Modification Annotation in UniProtKB

Anne-Lise Veuthey et al. BMC Bioinformatics. .
Free PMC article


Background: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB.

Results: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments.

Conclusions: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at


Figure 1
Figure 1
A typical sentence with information on protein glycosylation: Boxes indicate the information that is extracted from the sentence.
Figure 2
Figure 2
An abstract containing information relevant to protein acetylation: the extracted sentences are highlighted in orange, PTM and site information in yellow, and gene/protein mentions in blue. The list of extracted sites and proteins with scores are also provided. The two last sentences which mention acetylation are not highlighted since they contain no site information.
Figure 3
Figure 3
Phosphosite information retrieval: pipeline for the retrieval of documents that potentially provide supporting evidence for existing phosphosite annotations in UniProtKB/Swiss-Prot, where such annotations were made on the basis of information from high-throughput mass spectrometry-based proteomics experiments.

Similar articles

See all similar articles

Cited by 7 articles

See all "Cited by" articles


    1. UniProt C. Reorganizing the protein space at the universal protein resource (UniProt) Nucleic Acids Res. 2012;40(Database issue):D71–D75. - PMC - PubMed
    1. Hirschman L, Burns GA, Krallinger M, Arighi C, Cohen B, Valencia A, Wu CH, Chatr-Aryamontri A, Dowell KG, Huala E. Text mining for the BioCuration workflow. Database. 2012;2012:bas020. doi: 10.1093/database/bas020. - DOI - PMC - PubMed
    1. Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A. Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biol. 2008;9(Suppl 2):S4. doi: 10.1186/gb-2008-9-s2-s4. - DOI - PMC - PubMed
    1. Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii J. Extracting Bio-molecular events from literature - the BioNLP'09 shared task. Comput Intell. 2011;27:513–540. doi: 10.1111/j.1467-8640.2011.00398.x. - DOI
    1. Ohta T, Pyysalo S, Tsujii J. Proceedings of the BioNLP 2011 Workshop Companion Volume for Shared Task: 24 June 2011. Portland: Association for Computational Linguistics; 2011. Overview of the epigenetics and post-translational modifications (EPI) task of BioNLP shared task 2011; pp. 16–25.

Publication types

LinkOut - more resources