A Novel Computer-Aided Diagnosis Scheme on Small Annotated Set: G2C-CAD

Biomed Res Int. 2019 Apr 15:2019:6425963. doi: 10.1155/2019/6425963. eCollection 2019.

Abstract

Purpose: Computer-aided diagnosis (CAD) can aid in improving diagnostic level; however, the main problem currently faced by CAD is that it cannot obtain sufficient labeled samples. To solve this problem, in this study, we adopt a generative adversarial network (GAN) approach and design a semisupervised learning algorithm, named G2C-CAD.

Methods: From the National Cancer Institute (NCI) Lung Image Database Consortium (LIDC) dataset, we extracted four types of pulmonary nodule sign images closely related to lung cancer: noncentral calcification, lobulation, spiculation, and nonsolid/ground-glass opacity (GGO) texture, obtaining a total of 3,196 samples. In addition, we randomly selected 2,000 non-lesion image blocks as negative samples. We split the data 90% for training and 10% for testing. We designed a DCGAN generative adversarial framework and trained it on the small sample set. We also trained our designed CNN-based fuzzy Co-forest on the labeled small sample set and obtained a preliminary classifier. Then, coupled with the simulated unlabeled samples generated by the trained DCGAN, we conducted iterative semisupervised learning, which continually improved the classification performance of the fuzzy Co-forest until the termination condition was reached. Finally, we tested the fuzzy Co-forest and compared its performance with that of a C4.5 random decision forest and the G2C-CAD system without the fuzzy scheme, using ROC and confusion matrix for evaluation.

Results: Four different types of lung cancer-related signs were used in the classification experiment: noncentral calcification, lobulation, spiculation, and nonsolid/ground-glass opacity (GGO) texture, along with negative image samples. For these five classes, the G2C-CAD system obtained AUCs of 0.946, 0.912, 0.908, 0.887, and 0.939, respectively. The average accuracy of G2C-CAD exceeded that of the C4.5 random decision tree by 14%. G2C-CAD also obtained promising test results on the LISS signs dataset; its AUCs for GGO, lobulation, spiculation, pleural indentation, and negative image samples were 0.972, 0.964, 0.941, 0.967, and 0.953, respectively.

Conclusion: The experimental results show that G2C-CAD is an appropriate method for addressing the problem of insufficient labeled samples in the medical image analysis field. Moreover, our system can be used to establish a training sample library for CAD classification diagnosis, which is important for future medical image analysis.

MeSH terms

  • Algorithms
  • Databases, Factual*
  • Humans
  • Image Processing, Computer-Assisted / methods*
  • Lung / diagnostic imaging
  • Lung Neoplasms / diagnosis
  • Lung Neoplasms / diagnostic imaging
  • Lung Neoplasms / pathology
  • Radiographic Image Interpretation, Computer-Assisted / methods*
  • Solitary Pulmonary Nodule / diagnostic imaging*
  • Tomography, X-Ray Computed / methods