Clinical data mining: a review

Yearb Med Inform. 2009:121-33.

Abstract

Objective: Clinical data mining is the application of data mining techniques using clinical data. We review the literature in order to provide a general overview by identifying the status-of-practice and the challenges ahead.

Methods: The nine data mining steps proposed by Fayyad in 1996 [4] were used as the main themes of the review. MEDLINE was used as primary source and 84 papers were retained based on our inclusion criteria.

Results: Clinical data mining has three objectives: understanding the clinical data, assist healthcare professionals, and develop a data analysis methodology suitable for medical data. Classification is the most frequently used data mining function with a predominance of the implementation of Bayesian classifiers, neural networks, and SVMs (Support Vector Machines). A myriad of quantitative performance measures were proposed with a predominance of accuracy, sensitivity, specificity, and ROC curves. The latter are usually associated with qualitative evaluation.

Conclusion: Clinical data mining respects its commitment to extracting new and previously unknown knowledge from clinical databases. More efforts are still needed to obtain a wider acceptance from the healthcare professionals and for generalization of the knowledge and reproducibility of its extraction process: better description of variables, systematic report of algorithm parameters including the method to obtain them, use of easy-to-understand models and comparisons of the efficiency of clinical data mining with traditional statistical analyses. More and more data will be available for data miners and they have to develop new methodologies and infrastructures to analyze the increasingly complex medical data.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Algorithms
  • Bibliometrics*
  • Clinical Medicine
  • Data Mining / methods
  • Data Mining / statistics & numerical data*