Data Curation for Preclinical and Clinical Multimodal Imaging Studies

Mol Imaging Biol. 2019 Dec;21(6):1034-1043. doi: 10.1007/s11307-019-01339-0.


Purpose: In biomedical research, imaging modalities help discover pathological mechanisms to develop and evaluate novel diagnostic and theranostic approaches. However, while standards for data storage in the clinical medical imaging field exist, data curation standards for biomedical research are yet to be established. This work aimed at developing a free secure file format for multimodal imaging studies, supporting common in vivo imaging modalities up to five dimensions as a step towards establishing data curation standards for biomedical research.

Procedures: Images are compressed using lossless compression algorithm. Cryptographic hashes are computed on the compressed image slices. The hashes and compressions are computed in parallel, speeding up computations depending on the number of available cores. Then, the hashed images with digitally signed timestamps are cryptographically written to file. Fields in the structure, compressed slices, hashes, and timestamps are serialized for writing and reading from files. The C++ implementation is tested on multimodal data from six imaging sites, well-documented, and integrated into a preclinical image analysis software.

Results: The format has been tested with several imaging modalities including fluorescence molecular tomography/x-ray computed tomography (CT), positron emission tomography (PET)/CT, single-photon emission computed tomography/CT, and PET/magnetic resonance imaging. To assess performance, we measured the compression rate, ratio, and time spent in compression. Additionally, the time and rate of writing and reading on a network drive were measured. Our findings demonstrate that we achieve close to 50 % reduction in storage space for μCT data. The parallelization speeds up the hash computations by a factor of 4. We achieve a compression rate of 137 MB/s for file of size 354 MB.

Conclusions: The development of this file format is a step to abstract and curate common processes involved in preclinical and clinical multimodal imaging studies in a standardized way. This work also defines better interface between multimodal imaging modalities and analysis software.

Keywords: Compression; Credibility; Cryptographic hashing; Data curation; File format; Metadata; Multimodal imaging; Reproducibility; Serialization; Timestamp.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Data Compression
  • Data Curation*
  • Image Processing, Computer-Assisted
  • Multimodal Imaging*