Tumor taxonomy for the developmental lineage classification of neoplasms

BMC Cancer. 2004 Nov 30:4:88. doi: 10.1186/1471-2407-4-88.

Abstract

Background: The new "Developmental lineage classification of neoplasms" was described in a prior publication. The classification is simple (the entire hierarchy is described with just 39 classifiers), comprehensive (providing a place for every tumor of man), and consistent with recent attempts to characterize tumors by cytogenetic and molecular features. A taxonomy is a list of the instances that populate a classification. The taxonomy of neoplasia attempts to list every known term for every known tumor of man.

Methods: The taxonomy provides each concept with a unique code and groups synonymous terms under the same concept. A Perl script validated successive drafts of the taxonomy ensuring that: 1) each term occurs only once in the taxonomy; 2) each term occurs in only one tumor class; 3) each concept code occurs in one and only one hierarchical position in the classification; and 4) the file containing the classification and taxonomy is a well-formed XML (eXtensible Markup Language) document.

Results: The taxonomy currently contains 122,632 different terms encompassing 5,376 neoplasm concepts. Each concept has, on average, 23 synonyms. The taxonomy populates "The developmental lineage classification of neoplasms," and is available as an XML file, currently 9+ Megabytes in length. A representation of the classification/taxonomy listing each term followed by its code, followed by its full ancestry, is available as a flat-file, 19+ Megabytes in length. The taxonomy is the largest nomenclature of neoplasms, with more than twice the number of neoplasm names found in other medical nomenclatures, including the 2004 version of the Unified Medical Language System, the Systematized Nomenclature of Medicine Clinical Terminology, the National Cancer Institute's Thesaurus, and the International Classification of Diseases Oncolology version.

Conclusions: This manuscript describes a comprehensive taxonomy of neoplasia that collects synonymous terms under a unique code number and assigns each tumor to a single class within the tumor hierarchy. The entire classification and taxonomy are available as open access files (in XML and flat-file formats) with this article.

MeSH terms

  • Cell Lineage*
  • Databases, Factual
  • Female
  • Humans
  • Male
  • Neoplasms / classification*
  • Neoplastic Stem Cells / classification