The center for expanded data annotation and retrieval

Mark A Musen; Carol A Bean; Kei-Hoi Cheung; Michel Dumontier; Kim A Durante; Olivier Gevaert; Alejandra Gonzalez-Beltran; Purvesh Khatri; Steven H Kleinstein; Martin J O'Connor; Yannick Pouliot; Philippe Rocca-Serra; Susanna-Assunta Sansone; Jeffrey A Wiser; CEDAR team

doi:10.1093/jamia/ocv048

The center for expanded data annotation and retrieval

J Am Med Inform Assoc. 2015 Nov;22(6):1148-52. doi: 10.1093/jamia/ocv048. Epub 2015 Jun 25.

Authors

Affiliations

¹ Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA USA musen@stanford.edu.
² Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA USA.
³ Interdepartmental Program in Computational Biology and Bioinformatics, Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT USA.
⁴ Stanford University Libraries, Stanford University, Stanford, CA USA.
⁵ Oxford e-Research Centre, University of Oxford, Oxford, UK.
⁶ Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA USA Stanford Institute for Immunity, Transplantation, and Infection, Stanford, CA USA.
⁷ Interdepartmental Program in Computational Biology and Bioinformatics, Departments of Pathology and Immunobiology, Yale University School of Medicine, New Haven, CT USA.
⁸ Northrop Grumman Corporation, West Falls Church, VA USA.

Abstract

The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments.

Keywords: biological ontologies; data collection; data curation; datasets as topic; standards.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Biological Ontologies
Biomedical Research*
Data Mining*
Datasets as Topic*
Humans
Information Storage and Retrieval
United States

Abstract

Publication types

MeSH terms

Grants and funding