Synthetic microbleeds generation for classifier training without ground truth

Comput Methods Programs Biomed. 2021 Aug:207:106127. doi: 10.1016/j.cmpb.2021.106127. Epub 2021 May 5.

Abstract

Background and objective: Cerebral microbleeds (CMB) are important biomarkers of cerebrovascular diseases and cognitive dysfunctions. Susceptibility weighted imaging (SWI) is a common MRI sequence where CMB appear as small hypointense blobs. The prevalence of CMB in the population and in each scan is low, resulting in tedious and time-consuming visual assessment. Automated detection methods would be of value but are challenged by the CMB low prevalence, the presence of mimics such as blood vessels, and the difficulty to obtain sufficient ground truth for training and testing. In this paper, synthetic CMB (sCMB) generation using an analytical model is proposed for training and testing machine learning methods. The main aim is creating perfect synthetic ground truth as similar as reals, in high number, with a high diversity of shape, volume, intensity, and location to improve training of supervised methods.

Method: sCMB were modelled with a random Gaussian shape and added to healthy brain locations. We compared training on our synthetic data to standard augmentation techniques. We performed a validation experiment using sCMB and report result for whole brain detection using a 10-fold cross validation design with an ensemble of 10 neural networks.

Results: Performance was close to state of the art (~9 false positives per scan), when random forest was trained on synthetic only and tested on real lesion. Other experiments showed that top detection performance could be achieved when training on synthetic CMB only. Our dataset is made available, including a version with 37,000 synthetic lesions, that could be used for benchmarking and training.

Conclusion: Our proposed synthetic microbleeds model is a powerful data augmentation approach for CMB classification with and should be considered for training automated lesion detection system from MRI SWI.

Keywords: Data augmentation; Gaussian modeling; Microbleeds detection; Neural network; Synthetic data generation.

MeSH terms

  • Brain
  • Cerebral Hemorrhage* / diagnostic imaging
  • Humans
  • Machine Learning
  • Magnetic Resonance Imaging*
  • Neural Networks, Computer