Definition and validation of SNOMED CT subsets using the expression constraint language

J Biomed Inform. 2021 May:117:103747. doi: 10.1016/j.jbi.2021.103747. Epub 2021 Mar 19.

Abstract

Background: SNOMED CT Expression Constraint Language (ECL) is a declarative language developed by SNOMED International for the definition of SNOMED CT Expression Constraints (ECs). ECs are executable expressions that define intensional subsets of clinical meanings by stating constraints over the logic definition of concepts. The execution of an EC on some SNOMED CT substrate yields the intended subset, and it requires an execution engine able to receive an EC as input, execute it, and return the matching concepts. An important issue regarding subsets of clinical concepts is their use in terminology binding between clinical information models and terminologies for defining the set of valid values of codified data.

Objective: To define and implement methods for the simplification, semantic validation and execution of ECs over a graph-oriented SNOMED CT database, and to provide a method for the visual representation of subsets in order to explore, understand and validate its content, as well as to develop an EC execution platform, called SNQuery, which makes use of these methods.

Methods: Since SNOMED CT is a directed and acyclic graph, we have used a graph-oriented database to represent the content of SNOMED CT, where the schema and instances are represented as graphs and the data manipulation is expressed by graph-oriented operations. For the execution of ECs over the graph database, it is performed a translation process in which ECs are translated into a set of Cypher Query Language queries. We have defined some EC simplification methods that leverage the logic structure underlying SNOMED CT. The purpose of these methods is to reduce the complexity of ECs and, in turn, its execution time, as well as to validate them from a SNOMED CT Concept Model and logical definition points of view. We also have developed a graphic representation based on the circle packing geometrical concept, which allows validating subsets, as well as pre-defined refsets and the terminology itself.

Results: We have developed SNQuery, a platform for the definition of intensional subsets of SNOMED CT concepts by means of the execution of ECs over a graph-oriented SNOMED CT database. Additionally, we have incorporated methods for the simplification and semantic validation of ECs, as well as for the visualization of subsets as a mechanism to understand and validate them. SNQuery has been evaluated in terms of EC execution times.

Conclusion: In this paper, we provide methods to simplify, semantically validate and execute ECs over a graph-oriented database. We also offer a method to visualize the intensional subsets obtained by executing ECs to explore, understand and validate them, as well as refsets and the terminology itself. The definition of intensional subsets is useful to bind content between clinical information models and clinical terminologies, which is a necessary step to achieve semantic interoperability between EHR systems.

Keywords: Expression constraint language; Expression constraint simplification; Graph database; SNOMED CT; Subset visualization.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Factual
  • Semantics*
  • Systematized Nomenclature of Medicine*
  • Translating