Natural language processing enabling COVID-19 predictive analytics to support data-driven patient advising and pooled testing

J Am Med Inform Assoc. 2021 Dec 28;29(1):12-21. doi: 10.1093/jamia/ocab186.


Objective: The COVID-19 (coronavirus disease 2019) pandemic response at the Medical University of South Carolina included virtual care visits for patients with suspected severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. The telehealth system used for these visits only exports a text note to integrate with the electronic health record, but structured and coded information about COVID-19 (eg, exposure, risk factors, symptoms) was needed to support clinical care and early research as well as predictive analytics for data-driven patient advising and pooled testing.

Materials and methods: To capture COVID-19 information from multiple sources, a new data mart and a new natural language processing (NLP) application prototype were developed. The NLP application combined reused components with dictionaries and rules crafted by domain experts. It was deployed as a Web service for hourly processing of new data from patients assessed or treated for COVID-19. The extracted information was then used to develop algorithms predicting SARS-CoV-2 diagnostic test results based on symptoms and exposure information.

Results: The dedicated data mart and NLP application were developed and deployed in a mere 10-day sprint in March 2020. The NLP application was evaluated with good accuracy (85.8% recall and 81.5% precision). The SARS-CoV-2 testing predictive analytics algorithms were configured to provide patients with data-driven COVID-19 testing advices with a sensitivity of 81% to 92% and to enable pooled testing with a negative predictive value of 90% to 91%, reducing the required tests to about 63%.

Conclusions: SARS-CoV-2 testing predictive analytics and NLP successfully enabled data-driven patient advising and pooled testing.

Keywords: data science [L01.305]; machine learning [g17.035.250.500]; medical informatics [L01.313.500]; natural language processing (nlp) [L01.224.050.375.580].

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19 Testing
  • COVID-19*
  • Humans
  • Natural Language Processing
  • Pandemics
  • SARS-CoV-2