Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 May 4;3:149-58.

Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System

Free PMC article

Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System

Michael Graiser et al. Cancer Inform. .
Free PMC article


Background: Large linked databases (LLDB) represent a novel resource for cancer outcomes research. However, accurate means of identifying a patient population of interest within these LLDBs can be challenging. Our research group developed a fully integrated platform that provides a means of combining independent legacy databases into a single cancer-focused LLDB system. We compared the sensitivity and specificity of several SQL-based query strategies for identifying a histologic lymphoma subtype in this LLDB to determine the most accurate legacy data source for identifying a specific cancer patient population.

Methods: Query strategies were developed to identify patients with follicular lymphoma from a LLDB of cancer registry data, electronic medical records (EMR), laboratory, administrative, pharmacy, and other clinical data. Queries were performed using common diagnostic codes (ICD-9), cancer registry histology codes (ICD-O), and text searches of EMRs. We reviewed medical records and pathology reports to confirm each diagnosis and calculated the sensitivity and specificity for each query strategy.

Results: Together the queries identified 1538 potential cases of follicular lymphoma. Review of pathology and other medical reports confirmed 415 cases of follicular lymphoma, 300 pathology-verified and 115 verified from other medical reports. The query using ICD-O codes was highly specific (96%). Queries using text strings varied in sensitivity (range 7-92%) and specificity (range 86-99%). Queries using ICD-9 codes were both less sensitive (34-44%) and specific (35-87%).

Conclusions: Queries of linked-cancer databases that include cancer registry data should utilize ICD-O codes or employ structured free-text searches to identify patient populations with a precise histologic diagnosis.

Keywords: Large linked database; cancer epidemiology; cancer outcomes research; cancer registry.


Figure 1.
Figure 1.
System architecture for the GeneSys SI oncology database application
Figure 2.
Figure 2.
Receiver-operator plot of query strategies to identify a pathology-confirmed histologic diagnosis of follicular lymphoma

Similar articles

See all similar articles

Cited by 6 articles

See all "Cited by" articles


    1. UMLS Knowledge Source Server (UMLSKS) 2006
    1. Barzilai DA, Koroukian SM, Neuhauser D, Cooper KD, Rimm AA, Cooper GS. The sensitivity of Medicare data for identifying incident cases of invasive melanoma (United States) Cancer Causes Control. 2004;15:179–84. - PubMed
    1. Benesch C, Witter DM, JR, Wilder AL, Duncan PW, Samsa GP, Matchar DB. Inaccuracy of the International Classification of Diseases (ICD-9-CM) in identifying the diagnosis of ischemic cerebrovascular disease. Neurology. 1997;49:660–4. - PubMed
    1. Castillo MS, Davis FG, Surawicz T, Bruner JM, Bigner S, Coons S, Bigner DD. Consistency of primary brain tumor diagnoses and codes in cancer surveillance systems. Neuroepidemiology. 2004;23:85–93. - PubMed
    1. Clarke CA, Glaser SL, Dorfman RF, Bracci PM, Eberle E, Holly EA, Glaser SL, Dorfman RF, Clarke CA. Expert review of non-Hodgkin’s lymphomas in a population-based cancer registry: reliability of diagnosis and subtype classifications 2004. - PubMed

LinkOut - more resources