Evaluation of two semi-supervised learning methods and their combination for automatic classification of bone marrow cells

Iori Nakamura; Haruhi Ida; Mayu Yabuta; Wataru Kashiwa; Maho Tsukamoto; Shigeki Sato; Syuichi Ota; Naoki Kobayashi; Hiromi Masauzi; Kazunori Okada; Sanae Kaga; Keiko Miwa; Hiroshi Kanai; Nobuo Masauzi

doi:10.1038/s41598-022-20651-4

Evaluation of two semi-supervised learning methods and their combination for automatic classification of bone marrow cells

Sci Rep. 2022 Oct 6;12(1):16736. doi: 10.1038/s41598-022-20651-4.

Authors

Iori Nakamura¹, Haruhi Ida¹, Mayu Yabuta¹, Wataru Kashiwa², Maho Tsukamoto¹, Shigeki Sato³, Syuichi Ota⁴, Naoki Kobayashi⁴, Hiromi Masauzi⁵, Kazunori Okada⁵, Sanae Kaga⁵, Keiko Miwa⁵, Hiroshi Kanai⁶, Nobuo Masauzi^{7

8}

Affiliations

¹ Graduate School of Health Sciences, Hokkaido University, Sapporo, Japan.
² Graduate School of Medicine, Hokkaido University, Sapporo, Japan.
³ Department of Clinical Laboratory, Sapporo Hokuyu Hospital, Sapporo, Japan.
⁴ Department of Hematology, Sapporo Hokuyu Hospital, Sapporo, Japan.
⁵ Faculty of Health Sciences, Hokkaido University, Sapporo, Japan.
⁶ Graduate School of Biomedical Engineering, Tohoku University, Sendai, Japan.
⁷ Faculty of Health Sciences, Hokkaido University, Sapporo, Japan. nobmas@sc4.so-net.ne.jp.
⁸ Graduate School of Biomedical Engineering, Tohoku University, Sendai, Japan. nobmas@sc4.so-net.ne.jp.

Abstract

Differential bone marrow (BM) cell counting is an important test for the diagnosis of various hematological diseases. However, it is difficult to accurately classify BM cells due to non-uniformity and the lack of reproducibility of differential counting. Therefore, automatic classification systems have been developed in which deep learning is used. These systems requires large and accurately labeled datasets for training. To overcome this, we used semi-supervised learning (SSL), in which learning proceeds while labeling. We used three methods: self-training (ST), active learning (AL), and a combination of these methods, and attempted to automatically classify 16 types of BM cell images. ST involves data verification, as in AL, before adding them to the training dataset (confirmed self-training: CST). After 25 rounds of CST, AL, and CST + AL, the initial number of training data increased from 425 to 40,518; 3682; and 47,843, respectively. Accuracies for the test data of 50 images for each cell type were 0.944, 0.941, and 0.976, respectively. Data added with CST or AL showed some imbalances between classes, while CST + AL exhibited fewer imbalances. We suggest that CST + AL, when combined with two SSL methods, is efficient in increasing training data for the development of automatic BM cells classification systems.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bone Marrow Cells*
Reproducibility of Results
Supervised Machine Learning*