Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network

BMC Immunol. 2011 Aug 26:12:49. doi: 10.1186/1471-2172-12-49.


Background: Vaccine literature indexing is poorly performed in PubMed due to limited hierarchy of Medical Subject Headings (MeSH) annotation in the vaccine field. Vaccine Ontology (VO) is a community-based biomedical ontology that represents various vaccines and their relations. SciMiner is an in-house literature mining system that supports literature indexing and gene name tagging. We hypothesize that application of VO in SciMiner will aid vaccine literature indexing and mining of vaccine-gene interaction networks. As a test case, we have examined vaccines for Brucella, the causative agent of brucellosis in humans and animals.

Results: The VO-based SciMiner (VO-SciMiner) was developed to incorporate a total of 67 Brucella vaccine terms. A set of rules for term expansion of VO terms were learned from training data, consisting of 90 biomedical articles related to Brucella vaccine terms. VO-SciMiner demonstrated high recall (91%) and precision (99%) from testing a separate set of 100 manually selected biomedical articles. VO-SciMiner indexing exhibited superior performance in retrieving Brucella vaccine-related papers over that obtained with MeSH-based PubMed literature search. For example, a VO-SciMiner search of "live attenuated Brucella vaccine" returned 922 hits as of April 20, 2011, while a PubMed search of the same query resulted in only 74 hits. Using the abstracts of 14,947 Brucella-related papers, VO-SciMiner identified 140 Brucella genes associated with Brucella vaccines. These genes included known protective antigens, virulence factors, and genes closely related to Brucella vaccines. These VO-interacting Brucella genes were significantly over-represented in biological functional categories, including metabolite transport and metabolism, replication and repair, cell wall biogenesis, intracellular trafficking and secretion, posttranslational modification, and chaperones. Furthermore, a comprehensive interaction network of Brucella vaccines and genes were identified. The asserted and inferred VO hierarchies provide semantic support for inferring novel knowledge of association of vaccines and genes from the retrieved data. New hypotheses were generated based on this analysis approach.

Conclusion: VO-SciMiner can be used to improve the efficiency for PubMed searching in the vaccine domain.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Abstracting and Indexing
  • Animals
  • Brucella / immunology*
  • Brucellosis / genetics*
  • Brucellosis / immunology*
  • Brucellosis / microbiology
  • Data Mining / methods
  • Gene Regulatory Networks* / immunology
  • Host-Pathogen Interactions
  • Humans
  • Medical Informatics
  • Medical Subject Headings
  • Natural Language Processing
  • Proteins / genetics
  • Proteins / immunology
  • Proteins / metabolism*
  • PubMed
  • Terminology as Topic


  • Proteins