Natural Language Processing Enabling COVID-19 Predictive Analytics to Support Data-Driven Patient Advising and Pooled Testing

J Am Med Inform Assoc. 2021 Aug 20;ocab186. doi: 10.1093/jamia/ocab186. Online ahead of print.


Objective: The COVID-19 pandemic response at MUSC included virtual care visits for patients with suspected SARS-CoV-2 infection. The telehealth system used for these visits only exports a text note to integrate with the EHR, but structured and coded information about COVID-19 (e.g., exposure, risk factors, symptoms) was needed to support clinical care and early research as well as predictive analytics for data-driven patient advising and pooled testing.

Methods: To capture COVID-19 information from multiple sources, a new data mart and a new Natural Language Processing (NLP) application prototype were developed. The NLP application combined reused components with dictionaries and rules crafted by domain experts. It was deployed as a web service for hourly processing of new data from patients assessed or treated for COVID-19. The extracted information was then used to develop algorithms predicting SARS-CoV-2 diagnostic test results based on symptoms and exposure information.

Results: The dedicated data mart and NLP application were developed and deployed in a mere 10-day sprint in March 2020. The NLP application was evaluated with good accuracy (85.8% recall and 81.5% precision). The SARS-CoV-2 testing predictive analytics algorithms were configured to provide patients with data-driven COVID-19 testing advices with a sensitivity of 81-92% and to enable pooled testing with a negative predictive value of 90-91% reducing the required tests to about 63%.

Conclusion: SARS-CoV-2 testing predictive analytics and NLP successfully enabled data-driven patient advising and pooled testing.

Keywords: Data Science [L01.305]; Machine Learning [G17.035.250.500]; Medical Informatics [L01.313.500]; Natural Language Processing (NLP) [L01.224.050.375.580].