Discovery of novel biomarkers and phenotypes by semantic technologies
- PMID: 23402646
- PMCID: PMC3605201
- DOI: 10.1186/1471-2105-14-51
Discovery of novel biomarkers and phenotypes by semantic technologies
Abstract
Background: Biomarkers and target-specific phenotypes are important to targeted drug design and individualized medicine, thus constituting an important aspect of modern pharmaceutical research and development. More and more, the discovery of relevant biomarkers is aided by in silico techniques based on applying data mining and computational chemistry on large molecular databases. However, there is an even larger source of valuable information available that can potentially be tapped for such discoveries: repositories constituted by research documents.
Results: This paper reports on a pilot experiment to discover potential novel biomarkers and phenotypes for diabetes and obesity by self-organized text mining of about 120,000 PubMed abstracts, public clinical trial summaries, and internal Merck research documents. These documents were directly analyzed by the InfoCodex semantic engine, without prior human manipulations such as parsing. Recall and precision against established, but different benchmarks lie in ranges up to 30% and 50% respectively. Retrieval of known entities missed by other traditional approaches could be demonstrated. Finally, the InfoCodex semantic engine was shown to discover new diabetes and obesity biomarkers and phenotypes. Amongst these were many interesting candidates with a high potential, although noticeable noise (uninteresting or obvious terms) was generated.
Conclusions: The reported approach of employing autonomous self-organising semantic engines to aid biomarker discovery, supplemented by appropriate manual curation processes, shows promise and has potential to impact, conservatively, a faster alternative to vocabulary processes dependent on humans having to read and analyze all the texts. More optimistically, it could impact pharmaceutical research, for example to shorten time-to-market of novel drugs, or speed up early recognition of dead ends and adverse reactions.
Figures
Similar articles
-
Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation.Database (Oxford). 2016 Mar 25;2016:baw025. doi: 10.1093/database/baw025. Print 2016. Database (Oxford). 2016. PMID: 27016698 Free PMC article.
-
Semantic retrieval and navigation in clinical document collections.Stud Health Technol Inform. 2015;212:9-14. Stud Health Technol Inform. 2015. PMID: 26063251
-
Constructing a biodiversity terminological inventory.PLoS One. 2017 Apr 17;12(4):e0175277. doi: 10.1371/journal.pone.0175277. eCollection 2017. PLoS One. 2017. PMID: 28414821 Free PMC article.
-
Literature mining, ontologies and information visualization for drug repurposing.Brief Bioinform. 2011 Jul;12(4):357-68. doi: 10.1093/bib/bbr005. Epub 2011 Jun 28. Brief Bioinform. 2011. PMID: 21712342 Review.
-
Computational polypharmacology with text mining and ontologies.Curr Pharm Biotechnol. 2011 Mar 1;12(3):449-57. doi: 10.2174/138920111794480624. Curr Pharm Biotechnol. 2011. PMID: 21133848 Review.
Cited by
-
Networks of neuroinjury semantic predications to identify biomarkers for mild traumatic brain injury.J Biomed Semantics. 2015 May 18;6:25. doi: 10.1186/s13326-015-0022-4. eCollection 2015. J Biomed Semantics. 2015. PMID: 25992264 Free PMC article.
-
Microtask crowdsourcing for disease mention annotation in PubMed abstracts.Pac Symp Biocomput. 2015:282-93. Pac Symp Biocomput. 2015. PMID: 25592589 Free PMC article.
-
Signs & symptoms of Dextromethorphan exposure from YouTube.PLoS One. 2014 Feb 12;9(2):e82452. doi: 10.1371/journal.pone.0082452. eCollection 2014. PLoS One. 2014. PMID: 24533044 Free PMC article.
-
Inducible and reversible phenotypes in a novel mouse model of Friedreich's Ataxia.Elife. 2017 Dec 19;6:e30054. doi: 10.7554/eLife.30054. Elife. 2017. PMID: 29257745 Free PMC article.
-
Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling.Comput Math Methods Med. 2018 Jul 22;2018:2497471. doi: 10.1155/2018/2497471. eCollection 2018. Comput Math Methods Med. 2018. PMID: 30140300 Free PMC article.
References
-
- The changing role of chemistry in drug discovery. Thomson Reuters: International Year of Chemistry (IYC 2011) report. http://www.thomsonreuters.com/content/science/pdf/ls/iyc2011.pdf.
-
- Ranjan J. Applications of data mining techniques in the pharmaceutical industry. Technol: J Theor Appl Inf; 2005. pp. 61–67.
-
- Mattos N. IBM study. 2005. http://news.cnet.com/IBM-dives-deeper-into-corporate-search/2100-7344_3-....
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
