A medical specialty indicates the skills needed by health care providers to conduct key procedures or make critical judgments. However, documentation about specialties may be lacking or inaccurately specified in a health care institution. Thus, we propose to leverage diagnosis histories to recognize medical specialties that exist in practice. Such specialties that are highly recognizable through diagnosis histories are de facto diagnosis specialties. We aim to recognize de facto diagnosis specialties that are listed in the Health Care Provider Taxonomy Code Set (HPTCS) and discover those that are unlisted. First, to recognize the former, we use similarity and supervised learning models. Next, to discover de facto diagnosis specialties unlisted in the HPTCS, we introduce a general discovery-evaluation framework. In this framework, we use a semi-supervised learning model and an unsupervised learning model, from which the discovered specialties are subsequently evaluated by the similarity and supervised learning models used in recognition. To illustrate the potential for these approaches, we collect 2 data sets of 1 year of diagnosis histories from a large academic medical center: One is a subset of the other except for additional information useful for network analysis. The results indicate that 12 core de facto diagnosis specialties listed in the HPTCS are highly recognizable. Additionally, the semi-supervised learning model discovers a specialty for breast cancer on the smaller data set based on network analysis, while the unsupervised learning model confirms this discovery and suggests an additional specialty for Obesity on the larger data set. The potential correctness of these 2 specialties is reinforced by the evaluation results that they are highly recognizable by similarity and supervised learning models in comparison with 12 core de facto diagnosis specialties listed in the HPTCS.
Keywords: diagnosis specialty; electronic health record; machine learning; medical informatics.