DLAD4U: deriving and prioritizing disease lists from PubMed literature

BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):495. doi: 10.1186/s12859-018-2463-0.

Abstract

Background: Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies. Here we report DLAD4U (Disease List Automatically Derived For You), an efficient, accurate and easy-to-use disease search engine based on PubMed literature.

Results: DLAD4U uses the eSearch and eFetch APIs from the National Center for Biotechnology Information (NCBI) to find publications related to a query and to identify diseases from the retrieved publications. The hypergeometric test was used to prioritize identified diseases for displaying to users. DLAD4U accepts any valid queries for PubMed, and the output results include a ranked disease list, information associated with each disease, chronologically-ordered supporting publications, a summary of the run, and links for file export. DLAD4U outperformed other disease search engines in our comparative evaluation using selected genes and drugs as query terms and manually curated data as "gold standard". For 100 genes that are associated with only one disease in the gold standard, the Mean Average Precision (MAP) measure from DLAD4U was 0.77, which clearly outperformed other tools. For 10 genes that are associated with multiple diseases in the gold standard, the mean precision, recall and F-measure scores from DLAD4U were always higher than those from other tools. The superior performance of DLAD4U was further confirmed using 100 drugs as queries, with an MAP of 0.90.

Conclusions: DLAD4U is a new, intuitive disease search engine that takes advantage of existing resources at NCBI to provide computational efficiency and uses statistical analyses to ensure accuracy. DLAD4U is publicly available at http://dlad4u.zhang-lab.org .

Keywords: Drug-disease association; Gene-disease association; Information retrieval; Literature mining; Web application.

MeSH terms

  • Disease / genetics
  • Genetic Association Studies
  • Humans
  • Information Storage and Retrieval*
  • Internet
  • Nitric Oxide Synthase Type III / metabolism
  • PubMed*
  • Publications*
  • Search Engine*
  • Tumor Necrosis Factor-alpha / metabolism

Substances

  • Tumor Necrosis Factor-alpha
  • NOS3 protein, human
  • Nitric Oxide Synthase Type III