Reuse of termino-ontological resources and text corpora for building a multilingual domain ontology: an application to Alzheimer's disease

J Biomed Inform. 2014 Apr:48:171-82. doi: 10.1016/j.jbi.2013.12.013. Epub 2013 Dec 29.

Abstract

Ontologies are useful tools for sharing and exchanging knowledge. However ontology construction is complex and often time consuming. In this paper, we present a method for building a bilingual domain ontology from textual and termino-ontological resources intended for semantic annotation and information retrieval of textual documents. This method combines two approaches: ontology learning from texts and the reuse of existing terminological resources. It consists of four steps: (i) term extraction from domain specific corpora (in French and English) using textual analysis tools, (ii) clustering of terms into concepts organized according to the UMLS Metathesaurus, (iii) ontology enrichment through the alignment of French and English terms using parallel corpora and the integration of new concepts, (iv) refinement and validation of results by domain experts. These validated results are formalized into a domain ontology dedicated to Alzheimer's disease and related syndromes which is available online (http://lesim.isped.u-bordeaux2.fr/SemBiP/ressources/ontoAD.owl). The latter currently includes 5765 concepts linked by 7499 taxonomic relationships and 10,889 non-taxonomic relationships. Among these results, 439 concepts absent from the UMLS were created and 608 new synonymous French terms were added. The proposed method is sufficiently flexible to be applied to other domains.

Keywords: Alzheimer’s disease; Ontological resource reuse; Ontology development; Parallel corpus; Term alignment.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alzheimer Disease / diagnosis*
  • Alzheimer Disease / physiopathology*
  • Classification
  • Humans
  • Information Storage and Retrieval
  • Language*
  • Medical Informatics / methods*
  • Reproducibility of Results
  • Semantics
  • Software
  • Unified Medical Language System
  • Vocabulary, Controlled