Literature mining for context-specific molecular relations using multimodal representations (COMMODAR)

Jaehyun Lee; Doheon Lee; Kwang Hyung Lee

doi:10.1186/s12859-020-3396-y

Literature mining for context-specific molecular relations using multimodal representations (COMMODAR)

BMC Bioinformatics. 2020 Oct 26;21(Suppl 5):250. doi: 10.1186/s12859-020-3396-y.

Authors

Jaehyun Lee¹, Doheon Lee^{2

3}, Kwang Hyung Lee⁴

Affiliations

¹ Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea.
² Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea. dhlee@kaist.ac.kr.
³ Bio-Synergy Research Center, Daejeon, South Korea. dhlee@kaist.ac.kr.
⁴ Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea. khlee@kaist.ac.kr.

Abstract

Biological contextual information helps understand various phenomena occurring in the biological systems consisting of complex molecular relations. The construction of context-specific relational resources vastly relies on laborious manual extraction from unstructured literature. In this paper, we propose COMMODAR, a machine learning-based literature mining framework for context-specific molecular relations using multimodal representations. The main idea of COMMODAR is the feature augmentation by the cooperation of multimodal representations for relation extraction. We leveraged biomedical domain knowledge as well as canonical linguistic information for more comprehensive representations of textual sources. The models based on multiple modalities outperformed those solely based on the linguistic modality. We applied COMMODAR to the 14 million PubMed abstracts and extracted 9214 context-specific molecular relations. All corpora, extracted data, evaluation results, and the implementation code are downloadable at https://github.com/jae-hyun-lee/commodar . CCS CONCEPTS: • Computing methodologies~Information extraction • Computing methodologies~Neural networks • Applied computing~Biological networks.

Keywords: Biological context; Literature mining; Natural language processing; Representation learning.

Publication types

Review

MeSH terms

Data Mining / methods*
Machine Learning*
PubMed*
Publications*

Grants and funding

NRF-2012M3A9C4048758/National Research Foundation of Korea