Integration of Unstructured Data into a Clinical Data Warehouse for Kidney Transplant Screening - Challenges & Solutions

Stud Health Technol Inform. 2020 Jun 16:270:272-276. doi: 10.3233/SHTI200165.

Abstract

After kidney transplantation graft rejection must be prevented. Therefore, a multitude of parameters of the patient is observed pre- and postoperatively. To support this process, the Screen Reject research project is developing a data warehouse optimized for kidney rejection diagnostics. In the course of this project it was discovered that important information are only available in form of free texts instead of structured data and can therefore not be processed by standard ETL tools, which is necessary to establish a digital expert system for rejection diagnostics. Due to this reason, data integration has been improved by a combination of methods from natural language processing and methods from image processing. Based on state-of-the-art data warehousing technologies (Microsoft SSIS), a generic data integration tool has been developed. The tool was evaluated by extracting Banff-classification from 218 pathology reports and extracting HLA mismatches from about 1700 PDF files, both written in german language.

Keywords: NLP; data warehouse; graft rejection; image processing; information extraction; kidney transplant.

MeSH terms

  • Data Warehousing*
  • Graft Rejection
  • Humans
  • Information Storage and Retrieval
  • Kidney
  • Kidney Transplantation*
  • Natural Language Processing