MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports

Sci Data. 2019 Dec 12;6(1):317. doi: 10.1038/s41597-019-0322-0.


Chest radiography is an extremely powerful imaging modality, allowing for a detailed inspection of a patient's chest, but requires specialized training for proper interpretation. With the advent of high performance general purpose computer vision algorithms, the accurate automated analysis of chest radiographs is becoming increasingly of interest to researchers. Here we describe MIMIC-CXR, a large dataset of 227,835 imaging studies for 65,379 patients presenting to the Beth Israel Deaconess Medical Center Emergency Department between 2011-2016. Each imaging study can contain one or more images, usually a frontal view and a lateral view. A total of 377,110 images are available in the dataset. Studies are made available with a semi-structured free-text radiology report that describes the radiological findings of the images, written by a practicing radiologist contemporaneously during routine clinical care. All images and reports have been de-identified to protect patient privacy. The dataset is made freely available to facilitate and encourage a wide range of research in computer vision, natural language processing, and clinical data mining.

Publication types

  • Dataset
  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Data Mining
  • Databases, Factual*
  • Humans
  • Image Interpretation, Computer-Assisted
  • Natural Language Processing
  • Radiography, Thoracic*