Concept-based semi-automatic classification of drugs

J Chem Inf Model. 2009 Aug;49(8):1986-92. doi: 10.1021/ci9000844.

Abstract

The anatomical therapeutic chemical (ATC) classification system maintained by the World Health Organization provides a global standard for the classification of medical substances and serves as a source for drug repurposing research. Nevertheless, it lacks several drugs that are major players in the global drug market. In order to establish classifications for yet unclassified drugs, this paper presents a newly developed approach based on a combination of information extraction (IE) and machine learning (ML) techniques. Most of the information about drugs is published in the scientific articles. Therefore, an IE-based framework is employed to extract terms from free text that express drug's chemical, pharmacological, therapeutic, and systemic effects. The extracted terms are used as features within a ML framework to predict putative ATC class labels for unclassified drugs. The system was tested on a portion of ATC containing drugs with an indication on the cardiovascular system. The class prediction turned out to be successful with the best predictive accuracy of 89.47% validated by a 100-fold bootstrapping of the training set and an accuracy of 77.12% on an independent test set. The presented concept-based classification system outperformed state-of-the-art classification methods based on chemical structure properties.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Databases, Factual
  • Natural Language Processing
  • Pharmaceutical Preparations / classification*

Substances

  • Pharmaceutical Preparations