Aim: To develop a machine learning model to predict the diagnosis of pulmonary embolism (PE).
Methods and results: We undertook a derivation and internal validation study to develop a risk prediction model for use in patients being investigated for possible PE. The machine learning technique, generalized logistic regression using elastic net, was chosen following an assessment of seven machine learning techniques and on the basis that it optimized the area under the receiver operator characteristic curve (AUC) and Brier score. Models were developed both with and without the addition of D-dimer. A total of 3347 patients were included in the study of whom, 219 (6.5%) had PE. Four clinical variables (O2 saturation, previous deep venous thrombosis or PE, immobilization or surgery, and alternative diagnosis equal or more likely than PE) plus D-dimer contributed to the machine learning models. The addition of D-dimer improved the AUC by 0.16 (95% confidence interval 0.13-0.19), from 0.73 to 0.89 (0.87-0.91) and decreased the Brier score by 14% (10-18%). More could be ruled out with a higher positive likelihood ratio than by the Wells score combined with D-dimer, revised Geneva score combined with D-dimer, or the Pulmonary Embolism Rule-out Criteria score. Machine learning with D-dimer maintained a low-false-negative rate at a true-negative rate of nearly 53%, which was better performance than any of the other alternatives.
Conclusion: A machine learning model outperformed traditional risk scores for the risk stratification of PE in the emergency department. However, external validation is needed.
Keywords: Diagnosis; Machine learning; Risk scores; Pulmonary embolism.
Published on behalf of the European Society of Cardiology. All rights reserved. © The Author(s) 2021. For permissions, please email: firstname.lastname@example.org.