Methods for Using Race and Ethnicity in Prediction Models for Lung Cancer Screening Eligibility

JAMA Netw Open. 2023 Sep 5;6(9):e2331155. doi: 10.1001/jamanetworkopen.2023.31155.


Importance: Using race and ethnicity in clinical prediction models can reduce or inadvertently increase racial and ethnic disparities in medical decisions.

Objective: To compare eligibility for lung cancer screening in a contemporary representative US population by refitting the life-years gained from screening-computed tomography (LYFS-CT) model to exclude race and ethnicity vs a counterfactual eligibility approach that recalculates life expectancy for racial and ethnic minority individuals using the same covariates but substitutes White race and uses the higher predicted life expectancy, ensuring that historically underserved groups are not penalized.

Design, setting, and participants: The 2 submodels composing LYFS-CT NoRace were refit and externally validated without race and ethnicity: the lung cancer death submodel in participants of a large clinical trial (recruited 1993-2001; followed up until December 31, 2009) who ever smoked (n = 39 180) and the all-cause mortality submodel in the National Health Interview Survey (NHIS) 1997-2001 participants aged 40 to 80 years who ever smoked (n = 74 842, followed up until December 31, 2006). Screening eligibility was examined in NHIS 2015-2018 participants aged 50 to 80 years who ever smoked. Data were analyzed from June 2021 to September 2022.

Exposure: Including and removing race and ethnicity (African American, Asian American, Hispanic American, White) in each LYFS-CT submodel.

Main outcomes and measures: By race and ethnicity: calibration of the LYFS-CT NoRace model and the counterfactual approach (ratio of expected to observed [E/O] outcomes), US individuals eligible for screening, predicted days of life gained from screening by LYFS-CT.

Results: The NHIS 2015-2018 included 25 601 individuals aged 50 to 80 years who ever smoked (2769 African American, 649 Asian American, 1855 Hispanic American, and 20 328 White individuals). Removing race and ethnicity from the submodels underestimated lung cancer death risk (expected/observed [E/O], 0.72; 95% CI, 0.52-1.00) and all-cause mortality (E/O, 0.90; 95% CI, 0.86-0.94) in African American individuals. It also overestimated mortality in Hispanic American (E/O, 1.08, 95% CI, 1.00-1.16) and Asian American individuals (E/O, 1.14, 95% CI, 1.01-1.30). Consequently, the LYFS-CT NoRace model increased Hispanic American and Asian American eligibility by 108% and 73%, respectively, while reducing African American eligibility by 39%. Using LYFS-CT with the counterfactual all-cause mortality model better maintained calibration across groups and increased African American eligibility by 13% without reducing eligibility for Hispanic American and Asian American individuals.

Conclusions and relevance: In this study, removing race and ethnicity miscalibrated LYFS-CT submodels and substantially reduced African American eligibility for lung cancer screening. Under counterfactual eligibility, no one became ineligible, and African American eligibility increased, demonstrating the potential for maintaining model accuracy while reducing disparities.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.
  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Asian
  • Black or African American
  • Early Detection of Cancer* / statistics & numerical data
  • Eligibility Determination* / statistics & numerical data
  • Ethnicity
  • Hispanic or Latino
  • Humans
  • Life Expectancy
  • Lung Neoplasms* / diagnosis
  • Lung Neoplasms* / epidemiology
  • Lung Neoplasms* / ethnology
  • Mass Screening* / statistics & numerical data
  • Middle Aged
  • Minority Groups
  • Models, Statistical
  • Race Factors
  • Risk Assessment
  • White