Machine learning-based automated sponge cytology for screening of oesophageal squamous cell carcinoma and adenocarcinoma of the oesophagogastric junction: a nationwide, multicohort, prospective study

Lancet Gastroenterol Hepatol. 2023 May;8(5):432-445. doi: 10.1016/S2468-1253(23)00004-3. Epub 2023 Mar 14.


Background: Oesophageal squamous cell carcinoma and adenocarcinoma of the oesophagogastric junction have a dismal prognosis, and early detection is key to reduce mortality. However, early detection depends on upper gastrointestinal endoscopy, which is not feasible to implement at a population level. We aimed to develop and validate a fully automated machine learning-based prediction tool integrating a minimally invasive sponge cytology test and epidemiological risk factors for screening of oesophageal squamous cell carcinoma and adenocarcinoma of the oesophagogastric junction before endoscopy.

Methods: For this multicohort prospective study, we enrolled participants aged 40-75 years undergoing upper gastrointestinal endoscopy screening at 39 tertiary or secondary hospitals in China for model training and testing, and included community-based screening participants for further validation. All participants underwent questionnaire surveys, sponge cytology testing, and endoscopy in a sequential manner. We trained machine learning models to predict a composite outcome of high-grade lesions, defined as histology-confirmed high-grade intraepithelial neoplasia and carcinoma of the oesophagus and oesophagogastric junction. The predictive features included 105 cytological and 15 epidemiological features. Model performance was primarily measured with the area under the receiver operating characteristic curve (AUROC) and average precision. The performance measures for cytologists with AI assistance was also assessed.

Findings: Between Jan 1, 2021, and June 30, 2022, 17 498 eligible participants were involved in model training and validation. In the testing set, the AUROC of the final model was 0·960 (95% CI 0·937 to 0·977) and the average precision was 0·482 (0·470 to 0·494). The model achieved similar performance to consensus of cytologists with AI assistance (AUROC 0·955 [95% CI 0·933 to 0·975]; p=0·749; difference 0·005, 95% CI, -0·011 to 0·020). If the model-defined moderate-risk and high-risk groups were referred for endoscopy, the sensitivity was 94·5% (95% CI 88·8 to 97·5), specificity was 91·9% (91·2 to 92·5), and the predictive positive value was 18·4% (15·6 to 21·6), and 90·3% of endoscopies could be avoided. Further validation in community-based screening showed that the AUROC of the model was 0·964 (95% CI 0·920 to 0·990), and 92·8% of endoscopies could be avoided after risk stratification.

Interpretation: We developed a prediction tool with favourable performance for screening of oesophageal squamous cell carcinoma and adenocarcinoma of the oesophagogastric junction. This approach could prevent the need for endoscopy screening in many low-risk individuals and ensure resource optimisation by prioritising high-risk individuals.

Funding: Science and Technology Commission of Shanghai Municipality.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenocarcinoma* / diagnosis
  • Adenocarcinoma* / epidemiology
  • Adenocarcinoma* / pathology
  • China / epidemiology
  • Esophageal Neoplasms* / diagnosis
  • Esophageal Neoplasms* / epidemiology
  • Esophageal Squamous Cell Carcinoma* / diagnosis
  • Esophageal Squamous Cell Carcinoma* / epidemiology
  • Esophagogastric Junction / pathology
  • Humans
  • Machine Learning
  • Prospective Studies