Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 1;2019:bay137.
doi: 10.1093/database/bay137.

PubTerm: A Web Tool for Organizing, Annotating and Curating Genes, Diseases, Molecules and Other Concepts From PubMed Records

Free PMC article

PubTerm: A Web Tool for Organizing, Annotating and Curating Genes, Diseases, Molecules and Other Concepts From PubMed Records

José Garcia-Pelaez et al. Database (Oxford). .
Free PMC article


Background and objective: Analysis, annotation and curation of biomedical scientific literature is a recurrent task in biomedical research, database curation and clinics. Commonly, the reading is centered on concepts such as genes, diseases or molecules. Database curators may also need to annotate published abstracts related to a specific topic. However, few free and intuitive tools exist to assist users in this context. Therefore, we developed PubTerm, a web tool to organize, categorize, curate and annotate a large number of PubMed abstracts related to biological entities such as genes, diseases, chemicals, species, sequence variants and other related information.

Methods: A variety of interfaces were implemented to facilitate curation and annotation, including the organization of abstracts by terms, by the co-occurrence of terms or by specific phrases. Information includes statistics on the occurrence of terms. The abstracts, terms and other related information can be annotated and categorized using user-defined categories. The session information can be saved and restored, and the data can be exported to other formats.

Results: The pipeline in PubTerm starts by specifying a PubMed query or list of PubMed identifiers. Then, the user can specify three lists of categories and specify what information will be highlighted in which colors. The user then utilizes the `term view' to organize the abstracts by gene, disease, species or other information to facilitate the annotation and categorization of terms or abstracts. Other views also facilitate the exploration of abstracts and connections between terms. We have used PubTerm to quickly and efficiently curate collections of more than 400 abstracts that mention more than 350 genes to generate revised lists of susceptibility genes for diseases. An example is provided for pulmonary arterial hypertension.

Conclusions: PubTerm saves time for literature revision by assisting with annotation organization and knowledge acquisition.


Figure 1
Figure 1
Implementation of PubTerm. Each box represents a server for computational services or a user browser. Dashed lines represent requests/responses and arrowheads represent flows of data.
Figure 2
Figure 2
Summary of PubTerm. The top scheme shows the typical pipeline. Below, the forms, views and options are shown.
Figure 3
Figure 3
Input methods for PubTerm. (A) Using a PubMed query in (1), specifying the number of records (2) and loading them up (3). (B) Using a list of PubMed IDs in (1) then loading them in (2).
Figure 4
Figure 4
Annotation and categorization. The terms (left) can be annotated by categories (1) and notes (2). The text within abstracts (right) can be marked (3), but the abstract itself can also be categorized (4) and annotated (5).
Figure 5
Figure 5
Views of the abstracts and terms. (A) Record view. (B) Term view. (C) Co-occurrence view. (D) Sentence view.

Similar articles

  • LiverCancerMarkerRIF: a liver cancer biomarker interactive curation system combining text mining and expert annotations.
    Dai HJ, Wu JC, Lin WS, Reyes AJ, Dela Rosa MA, Syed-Abdul S, Tsai RT, Hsu WL. Dai HJ, et al. Database (Oxford). 2014 Aug 27;2014:bau085. doi: 10.1093/database/bau085. Print 2014. Database (Oxford). 2014. PMID: 25168057 Free PMC article.
  • Assisting manual literature curation for protein-protein interactions using BioQRator.
    Kwon D, Kim S, Shin SY, Chatr-aryamontri A, Wilbur WJ. Kwon D, et al. Database (Oxford). 2014 Jul 22;2014:bau067. doi: 10.1093/database/bau067. Print 2014. Database (Oxford). 2014. PMID: 25052701 Free PMC article.
  • MET network in PubMed: a text-mined network visualization and curation system.
    Dai HJ, Su CH, Lai PT, Huang MS, Jonnagaddala J, Rose Jue T, Rao S, Chou HJ, Milacic M, Singh O, Syed-Abdul S, Hsu WL. Dai HJ, et al. Database (Oxford). 2016 May 30;2016:baw090. doi: 10.1093/database/baw090. Print 2016. Database (Oxford). 2016. PMID: 27242035 Free PMC article.
  • HEALTH GeoJunction: place-time-concept browsing of health publications.
    MacEachren AM, Stryker MS, Turton IJ, Pezanowski S. MacEachren AM, et al. Int J Health Geogr. 2010 May 18;9:23. doi: 10.1186/1476-072X-9-23. Int J Health Geogr. 2010. PMID: 20482806 Free PMC article. Review.
  • Mitochondrial Disease Sequence Data Resource (MSeqDR): a global grass-roots consortium to facilitate deposition, curation, annotation, and integrated analysis of genomic data for the mitochondrial disease clinical and research communities.
    Falk MJ, Shen L, Gonzalez M, Leipzig J, Lott MT, Stassen AP, Diroma MA, Navarro-Gomez D, Yeske P, Bai R, Boles RG, Brilhante V, Ralph D, DaRe JT, Shelton R, Terry SF, Zhang Z, Copeland WC, van Oven M, Prokisch H, Wallace DC, Attimonelli M, Krotoski D, Zuchner S, Gai X; MSeqDR Consortium Participants; MSeqDR Consortium participants: Sherri Bale, Jirair Bedoyan, Doron Behar, Penelope Bonnen, Lisa Brooks, Claudia Calabrese, Sarah Calvo, Patrick Chinnery, John Christodoulou, Deanna Church,; Rosanna Clima, Bruce H. Cohen, Richard G. Cotton, IFM de Coo, Olga Derbenevoa, Johan T. den Dunnen, David Dimmock, Gregory Enns, Giuseppe Gasparre,; Amy Goldstein, Iris Gonzalez, Katrina Gwinn, Sihoun Hahn, Richard H. Haas, Hakon Hakonarson, Michio Hirano, Douglas Kerr, Dong Li, Maria Lvova, Finley Macrae, Donna Maglott, Elizabeth McCormick, Grant Mitchell, Vamsi K. Mootha, Yasushi Okazaki,; Aurora Pujol, Melissa Parisi, Juan Carlos Perin, Eric A. Pierce, Vincent Procaccio, Shamima Rahman, Honey Reddi, Heidi Rehm, Erin Riggs, Richard Rodenburg, Yaffa Rubinstein, Russell Saneto, Mariangela Santorsola, Curt Scharfe,; Claire Sheldon, Eric A. Shoubridge, Domenico Simone, Bert Smeets, Jan A. Smeitink, Christine Stanley, Anu Suomalainen, Mark Tarnopolsky, Isabelle Thiffault, David R. Thorburn, Johan Van Hove, Lynne Wolfe, and Lee-Jun Wong. Falk MJ, et al. Mol Genet Metab. 2015 Mar;114(3):388-96. doi: 10.1016/j.ymgme.2014.11.016. Epub 2014 Dec 4. Mol Genet Metab. 2015. PMID: 25542617 Free PMC article. Review.
See all similar articles

Cited by 1 article


    1. Karp P.D. (2016) Can we replace curation with information extraction software? Database (Oxford), 2016, baw150. - PMC - PubMed
    1. Burge S., Attwood T.K., Bateman A. et al. (2012) Biocurators and biocuration: surveying the 21st century challenges. Database (Oxford), 2012, 1–7. - PMC - PubMed
    1. Lu Z. (2011) PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford), 2011, baq036. - PMC - PubMed
    1. Wei C.-H., Kao H.-Y. and Lu Z. (2013) PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res., 41, W518–W522. - PMC - PubMed
    1. Keepanasseril A. (2014) PubMed alternatives to search MEDLINE: an environmental scan. Indian J. Dent. Res., 25, 527. - PubMed