Automatic Extraction and Post-coordination of Spatial Relations in Consumer Language

AMIA Annu Symp Proc. 2015 Nov 5;2015:1083-92. eCollection 2015.


To incorporate ontological concepts in natural language processing (NLP) it is often necessary to combine simple concepts into complex concepts (post-coordination). This is especially true in consumer language, where a more limited vocabulary forces consumers to utilize highly productive language that is almost impossible to pre-coordinate in an ontology. Our work focuses on recognizing an important case for post-coordination in natural language: spatial relations between disorders and anatomical structures. Consumers typically utilize such spatial relations when describing symptoms. We describe an annotated corpus of 2,000 sentences with 1,300 spatial relations, and a second corpus of 500 of these relations manually normalized to UMLS concepts. We use machine learning techniques to recognize these relations, obtaining good performance. Further, we experiment with methods to normalize the relations to an existing ontology. This two-step process is analogous to the combination of concept recognition and normalization, and achieves comparable results.

MeSH terms

  • Consumer Health Informatics*
  • Humans
  • Natural Language Processing*
  • Unified Medical Language System*
  • Vocabulary
  • Vocabulary, Controlled*