Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 10 Suppl 2 (Suppl 2), S4

Metadata Mapping and Reuse in caBIG

Affiliations

Metadata Mapping and Reuse in caBIG

Isaac Kunz et al. BMC Bioinformatics.

Abstract

Background: This paper proposes that interoperability across biomedical databases can be improved by utilizing a repository of Common Data Elements (CDEs), UML model class-attributes and simple lexical algorithms to facilitate the building domain models. This is examined in the context of an existing system, the National Cancer Institute (NCI)'s cancer Biomedical Informatics Grid (caBIG). The goal is to demonstrate the deployment of open source tools that can be used to effectively map models and enable the reuse of existing information objects and CDEs in the development of new models for translational research applications. This effort is intended to help developers reuse appropriate CDEs to enable interoperability of their systems when developing within the caBIG framework or other frameworks that use metadata repositories.

Results: The Dice (di-grams) and Dynamic algorithms are compared and both algorithms have similar performance matching UML model class-attributes to CDE class object-property pairs. With algorithms used, the baselines for automatically finding the matches are reasonable for the data models examined. It suggests that automatic mapping of UML models and CDEs is feasible within the caBIG framework and potentially any framework that uses a metadata repository.

Conclusion: This work opens up the possibility of using mapping algorithms to reduce cost and time required to map local data models to a reference data model such as those used within caBIG. This effort contributes to facilitating the development of interoperable systems within caBIG as well as other metadata frameworks. Such efforts are critical to address the need to develop systems to handle enormous amounts of diverse data that can be leveraged from new biomedical methodologies.

Figures

Figure 1
Figure 1
caBIG™ UML model. This is an example of a portion of a UML model of a system available in caBIG™. The model describes the classes and attributes of the system and information about function and relationships. Note the class Race has attributes id (a string identifier of a race) and raceDesc (a string description of a race). This Race class is mapped to a CDE within caBIG™ to give a semantic definition and allow reuse of this type of data element.
Figure 2
Figure 2
Visual representation of several caBIG™ UML models. An example of several UML models available in caBIG™ for reuse.
Figure 3
Figure 3
UML and ISO/IEC11179. The mapping of UML elements to the ISO 11179 Common Data Elements (CDE) within the caDSR. UML Class maps to Object Class, UML Attribute to Property, and UML Data Type to Property. Object Class and Property components of the Data Element Concept are then mapped to Terminology concepts stored in EVS.
Figure 4
Figure 4
Per project dice vs dynamic. Total percentage of "Gold Standard" matches per cumulative rank per project
Figure 5
Figure 5
Combined project dice vs dynamic. Total percentage of "Gold Standard" matches per cumulative rank for all "RELEASED" CDEs.
Figure 6
Figure 6
Dice per project. This graph shows 20 of the 66 projects mapped to a restricted set of CDEs using the Dice algorithm. Restriction is made by only mapping to corresponding CDEs as indicated in caDSR.

Similar articles

See all similar articles

Cited by 10 articles

See all "Cited by" articles

References

    1. Frey LJ, Maojo V, Mitchell JA. Bioinformatics Linkage of Heterogeneous Clinical and Genomic Information in Support of Personalized Medicine. IMIA Yearbook of Medical Informatics. 2007. pp. 159–166. - PubMed
    1. Frey LJ, Maojo V, Mitchell JA. Advances in Genome Sequencing Technology and Algorithms. Artech House Publishers I; 2007. Genome Sequencing: a Complex Path to Personalized Medicine; pp. 51–73.
    1. Buetow KH. Cyberinfrastructure: Empowering a "Third Way" in Biomedical Research. Science. 2005;308:821–824. doi: 10.1126/science.1112120. - DOI - PubMed
    1. Dolin RH, Huff SM, Rocha RA, Spackman KA, Campbell KE. Evaluation of a "lexically assign, logically refine" strategy for semi-automated integration of overlapping terminologies. J Am Med Inform Assoc. 1998;5:203–213. - PMC - PubMed
    1. Noy N. Tools for mapping and merging ontologies. In: Staab S, Studer R, editor. Handbook on Ontologies. 2004. pp. 365–384.

LinkOut - more resources

Feedback