A tree-based scan statistic for database disease surveillance

Biometrics. 2003 Jun;59(2):323-31. doi: 10.1111/1541-0420.00039.


Many databases exist with which it is possible to study the relationship between health events and various potential risk factors. Among these databases, some have variables that naturally form a hierarchical tree structure, such as pharmaceutical drugs and occupations. It is of great interest to use such databases for surveillance purposes in order to detect unsuspected relationships to disease risk. We propose a tree-based scan statistic, by which the surveillance can be conducted with a minimum of prior assumptions about the group of occupations/drugs that increase risk, and which adjusts for the multiple testing inherent in the many potential combinations. The method is illustrated using data from the National Center for Health Statistics Multiple Cause of Death Database, looking at the relationship between occupation and death from silicosis.

MeSH terms

  • Databases, Factual
  • Decision Trees*
  • Humans
  • Information Storage and Retrieval / methods*
  • National Center for Health Statistics, U.S.
  • Occupational Exposure / adverse effects
  • Population Surveillance / methods*
  • Silicosis / epidemiology
  • Statistics as Topic / methods*
  • United States / epidemiology
  • Vital Statistics