OCT5k: A dataset of multi-disease and multi-graded annotations for retinal layers

Sci Data. 2025 Feb 14;12(1):267. doi: 10.1038/s41597-024-04259-z.

Abstract

Publicly available open-access OCT datasets for retinal layer segmentation have been limited in scope, often being small in size, specific to a single disease, or containing only one grading. This dataset improves upon this with multi-grader and multi-disease labels for training machine learning-based algorithms. The proposed dataset covers three subsets of scans (Age-related Macular Degeneration, Diabetic Macular Edema, and healthy) and annotations for two types of tasks (semantic segmentation and object detection). This dataset compiled 5016 pixel-wise manual labels for 1672 OCT scans featuring 5 layer boundaries for three different disease classes to support development of automatic techniques. A subset of data (566 scans across 9 classes of disease biomarkers) was subsequently labeled for disease features for 4698 bounding box annotations. To minimize bias, images were shuffled and distributed among graders. Retinal layers were corrected, and outliers identified using the interquartile range (IQR). This step was iterated three times, improving layer annotations' quality iteratively, ensuring a reliable dataset for automated retinal image analysis.

Publication types

  • Dataset

MeSH terms

  • Algorithms
  • Diabetic Retinopathy / diagnostic imaging
  • Humans
  • Image Processing, Computer-Assisted / methods
  • Machine Learning
  • Macular Degeneration* / diagnostic imaging
  • Macular Edema / diagnostic imaging
  • Retina* / diagnostic imaging
  • Tomography, Optical Coherence*