Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data

Jeffrey G Klann; Hossein Estiri; Griffin M Weber; Bertrand Moal; Paul Avillach; Chuan Hong; Amelia L M Tan; Brett K Beaulieu-Jones; Victor Castro; Thomas Maulhardt; Alon Geva; Alberto Malovini; Andrew M South; Shyam Visweswaran; Michele Morris; Malarkodi J Samayamuthu; Gilbert S Omenn; Kee Yuan Ngiam; Kenneth D Mandl; Martin Boeker; Karen L Olson; Danielle L Mowery; Robert W Follett; David A Hanauer; Riccardo Bellazzi; Jason H Moore; Ne-Hooi Will Loh; Douglas S Bell; Kavishwar B Wagholikar; Luca Chiovato; Valentina Tibollo; Siegbert Rieg; Anthony L L J Li; Vianney Jouhet; Emily Schriver; Zongqi Xia; Meghan Hutch; Yuan Luo; Isaac S Kohane; Consortium for Clinical Characterization of COVID-19 by EHR (4CE) (CONSORTIA AUTHOR); Gabriel A Brat; Shawn N Murphy

doi:10.1093/jamia/ocab018

Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data

J Am Med Inform Assoc. 2021 Jul 14;28(7):1411-1420. doi: 10.1093/jamia/ocab018.

Authors

Jeffrey G Klann¹, Hossein Estiri¹, Griffin M Weber², Bertrand Moal³, Paul Avillach⁴, Chuan Hong⁴, Amelia L M Tan⁴, Brett K Beaulieu-Jones⁴, Victor Castro⁵, Thomas Maulhardt⁶, Alon Geva^{7

8}, Alberto Malovini⁹, Andrew M South¹⁰, Shyam Visweswaran¹¹, Michele Morris¹¹, Malarkodi J Samayamuthu¹¹, Gilbert S Omenn¹², Kee Yuan Ngiam¹³, Kenneth D Mandl⁸, Martin Boeker⁶, Karen L Olson⁸, Danielle L Mowery¹⁴, Robert W Follett¹⁵, David A Hanauer¹⁶, Riccardo Bellazzi^{9

17}, Jason H Moore¹⁴, Ne-Hooi Will Loh¹⁸, Douglas S Bell¹⁵, Kavishwar B Wagholikar¹⁹, Luca Chiovato^{9

20}, Valentina Tibollo⁹, Siegbert Rieg²¹, Anthony L L J Li²², Vianney Jouhet²³, Emily Schriver²⁴, Zongqi Xia²⁵, Meghan Hutch²⁶, Yuan Luo²⁶, Isaac S Kohane⁴; Consortium for Clinical Characterization of COVID-19 by EHR (4CE) (CONSORTIA AUTHOR); Gabriel A Brat⁴, Shawn N Murphy^{27

28}

Affiliations

¹ Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA.
² Department of Biomedical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA.
³ IAM Unit, Public Health Department , Bordeaux University Hospital, Bordeaux, France.
⁴ Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.
⁵ Research Information Science and Computing, Mass General Brigham, Boston, Massachusetts, USA.
⁶ Institute of Medical Biometry and Statistics, Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
⁷ Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts, USA.
⁸ Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.
⁹ Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy.
¹⁰ Section of Nephrology, Department of Pediatrics, Brenner Children's Hospital, Wake Forest School of Medicine, Winston Salem, North Carolina, USA.
¹¹ Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.
¹² Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA.
¹³ Department of Biomedical Informatics-WisDM, National University Health System, Singapore.
¹⁴ Department of Biostatistics, Epidemiology, and Informatics, Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.
¹⁵ Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, California, USA.
¹⁶ Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, USA.
¹⁷ Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy.
¹⁸ Division of Critical Care, National University Health System, Singapore.
¹⁹ Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA.
²⁰ Department of Internal Medicine and Medical Therapy, University of Pavia, Pavia, Italy.
²¹ Division of Infectious Diseases, Department of Medicine II, Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
²² National Center for Infectious Diseases, Tan Tock Seng Hospital, Singapore.
²³ ERIAS-INSERM U1219 BPH, Bordeaux University Hospital, Bordeaux, France.
²⁴ Data Analytics Center, Penn Medicine, Philadelphia, Pennsylvania, USA.
²⁵ Department of Neurology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.
²⁶ Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA.
²⁷ Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA.
²⁸ Research Information Science and Computing , Mass General Brigham, Boston, Massachusetts, USA.

Abstract

Objective: The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing coronavirus disease 2019 (COVID-19) with federated analyses of electronic health record (EHR) data. We sought to develop and validate a computable phenotype for COVID-19 severity.

Materials and methods: Twelve 4CE sites participated. First, we developed an EHR-based severity phenotype consisting of 6 code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of intensive care unit (ICU) admission and/or death. We also piloted an alternative machine learning approach and compared selected predictors of severity with the 4CE phenotype at 1 site.

Results: The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability-up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean area under the curve of 0.903 (95% confidence interval, 0.886-0.921), compared with an area under the curve of 0.956 (95% confidence interval, 0.952-0.959) for the machine learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared with chart review.

Discussion: We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly owing to heterogeneous pandemic conditions.

Conclusions: We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.

Keywords: computable phenotype; data interoperability; data networking; disease severity; medical informatics; novel coronavirus.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Validation Study

MeSH terms

COVID-19* / classification
Electronic Health Records*
Hospitalization
Humans
Machine Learning
Prognosis
ROC Curve
Sensitivity and Specificity
Severity of Illness Index*

Abstract

Publication types

MeSH terms

Grants and funding