Somatic mutation databases as tools for molecular epidemiology and molecular pathology of cancer: proposed guidelines for improving data collection, distribution, and integration

Hum Mutat. 2009 Mar;30(3):275-82. doi: 10.1002/humu.20832.


There are currently less than 40 locus-specific databases (LSDBs) and one large general database that curate data on somatic mutations in human cancer genes. These databases have different scope and use different annotation standards and database systems, resulting in duplicated efforts in data curation, and making it difficult for users to find clear and consistent information. As data related to somatic mutations are generated at an increasing pace it is urgent to create a framework for improving the collecting of this information and making it more accessible to clinicians, scientists, and epidemiologists to facilitate research on biomarkers. Here we propose a data flow for improving the connectivity between existing databases and we provide practical guidelines for data reporting, database contents, and annotation standards. These proposals are based on common standards recommended by the Human Genome Variation Society (HGVS) with additions related to specific requirements of somatic mutations in cancer. Indeed, somatic mutations may be used in molecular pathology and clinical studies to characterize tumor types, help treatment choice, predict response to treatment and patient outcome, or in epidemiological studies as markers for tumor etiology or exposure assessment. Thus, specific annotations are required to cover these diverse research topics. This initiative is meant to promote collaboration and discussion on these issues and the development of adequate resources that would avoid the loss of extremely valuable information generated by years of basic and clinical research.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Collection / methods
  • Databases, Genetic / standards*
  • Guidelines as Topic
  • Humans
  • Information Dissemination
  • Internet
  • Molecular Epidemiology / methods
  • Molecular Epidemiology / statistics & numerical data
  • Mutation*
  • Neoplasms / epidemiology
  • Neoplasms / genetics*
  • Neoplasms / pathology
  • Pathology, Clinical / methods
  • Pathology, Clinical / statistics & numerical data
  • Systems Integration