Data mining and clinical data repositories: Insights from a 667,000 patient data set

Comput Biol Med. 2006 Dec;36(12):1351-77. doi: 10.1016/j.compbiomed.2005.08.003. Epub 2005 Dec 22.


Clinical repositories containing large amounts of biological, clinical, and administrative data are increasingly becoming available as health care systems integrate patient information for research and utilization objectives. To investigate the potential value of searching these databases for novel insights, we applied a new data mining approach, HealthMiner, to a large cohort of 667,000 inpatient and outpatient digital records from an academic medical system. HealthMiner approaches knowledge discovery using three unsupervised methods: CliniMiner, Predictive Analysis, and Pattern Discovery. The initial results from this study suggest that these approaches have the potential to expand research capabilities through identification of potentially novel clinical disease associations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Clinical Chemistry Tests
  • Cohort Studies
  • Data Interpretation, Statistical
  • Databases, Factual*
  • Humans
  • Medical Informatics Computing*
  • Medical Records Systems, Computerized*
  • Predictive Value of Tests