GasHisSDB: A new gastric histopathology image dataset for computer aided diagnosis of gastric cancer

Comput Biol Med. 2022 Mar:142:105207. doi: 10.1016/j.compbiomed.2021.105207. Epub 2022 Jan 6.

Abstract

Background and objective: Gastric cancer is the fifth most common cancer globally, and early detection of gastric cancer is essential to save lives. Histopathological examination of gastric cancer is the gold standard for the diagnosis of gastric cancer. However, computer-aided diagnostic techniques are challenging to evaluate due to the scarcity of publicly available gastric histopathology image datasets.

Methods: In this paper, a noble publicly available Gastric Histopathology Sub-size Image Database (GasHisSDB) is published to identify classifiers' performance. Specifically, two types of data are included: normal and abnormal, with a total of 245,196 tissue case images. In order to prove that the methods of different periods in the field of image classification have discrepancies on GasHisSDB, we select a variety of classifiers for evaluation. Seven classical machine learning classifiers, three Convolutional Neural Network classifiers, and a novel transformer-based classifier are selected for testing on image classification tasks.

Results: This study performed extensive experiments using traditional machine learning and deep learning methods to prove that the methods of different periods have discrepancies on GasHisSDB. Traditional machine learning achieved the best accuracy rate of 86.08% and a minimum of just 41.12%. The best accuracy of deep learning reached 96.47% and the lowest was 86.21%. Accuracy rates vary significantly across classifiers.

Conclusions: To the best of our knowledge, it is the first publicly available gastric cancer histopathology dataset containing a large number of images for weakly supervised learning. We believe that GasHisSDB can attract researchers to explore new algorithms for the automated diagnosis of gastric cancer, which can help physicians and patients in the clinical setting.

Keywords: Database; Gastric histopathology; Image classification; Sub-size image.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Diagnosis, Computer-Assisted
  • Humans
  • Machine Learning
  • Neural Networks, Computer
  • Stomach Neoplasms* / diagnostic imaging