Recently, to speed up the differential-diagnosis process based on symptoms and signs observed from an affected individual in the diagnosis of rare diseases, researchers have developed and implemented phenotype-driven differential-diagnosis systems. The performance of those systems relies on the quantity and quality of underlying databases of disease-phenotype associations (DPAs). Although such databases are often developed by manual curation, they inherently suffer from limited coverage. To address this problem, we propose a text-mining approach to increase the coverage of DPA databases and consequently improve the performance of differential-diagnosis systems. Our analysis showed that a text-mining approach using one million case reports obtained from PubMed could increase the coverage of manually curated DPAs in Orphanet by 125.6%. We also present PubCaseFinder (see Web Resources), a new phenotype-driven differential-diagnosis system in a freely available web application. By utilizing automatically extracted DPAs from case reports in addition to manually curated DPAs, PubCaseFinder improves the performance of automated differential diagnosis. Moreover, PubCaseFinder helps clinicians search for relevant case reports by using phenotype-based comparisons and confirm the results with detailed contextual information.
Keywords: Human Phenotype Ontology; PubCaseFinder; case report; differential diagnosis; disease-phenotype association; rare disease.
Copyright © 2018 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.