Early Diagnosis of Primary Immunodeficiency Disease Using Clinical Data and Machine Learning

J Allergy Clin Immunol Pract. 2022 Nov;10(11):3002-3007.e5. doi: 10.1016/j.jaip.2022.08.041. Epub 2022 Sep 13.


Background: Primary immunodeficiency diseases (PIDD) are a group of immune-related disorders that have a current median delay of diagnosis between 6 and 9 years. Early diagnosis and treatment of PIDD has been associated with improved patient outcomes.

Objective: To develop a machine learning model using elements within the electronic health record data that are related to prior symptomatic treatment to predict PIDD.

Methods: We conducted a retrospective study of patients with PIDD identified using inclusion criteria of PIDD-related diagnoses, immunodeficiency-specific medications, and low immunoglobulin levels. We constructed a control group of age-, sex-, and race-matched patients with asthma. The primary outcome was the diagnosis of PIDD. We considered comorbidities, laboratory tests, medications, and radiological orders as features, all before diagnosis and indicative of symptom-related treatment. Features were presented sequentially to logistic regression, elastic net, and random forest classifiers, which were trained using a nested cross-validation approach.

Results: Our cohort consisted of 6422 patients, of whom 247 (4%) were diagnosed with PIDD. Our logistic regression model with comorbidities demonstrated good discrimination between patients with PIDD and those with asthma (c-statistic: 0.62 [0.58-0.65]). Adding laboratory results, medications, and radiological orders improved discrimination (c-statistic: 0.70 vs 0.62, P < .001), sensitivity, and specificity. Extending to the advanced machine learning models did not improve performance.

Conclusions: We developed a prediction model for early diagnosis of PIDD using historical data that are related to symptomatic care, which has potential to fill an important need in reducing the time to diagnose PIDD, leading to better outcomes for immunodeficient patients.

Keywords: Common variable immunodeficiency (CVID); Electronic health record (EHR); Immunodeficiency; Machine learning; Primary immunodeficiency diseases (PIDD); Specific antibody deficiency.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Asthma* / complications
  • Asthma* / diagnosis
  • Early Diagnosis
  • Humans
  • Immunologic Deficiency Syndromes* / therapy
  • Machine Learning
  • Primary Immunodeficiency Diseases* / diagnosis
  • Retrospective Studies