Visualizing and clustering high throughput sub-cellular localization imaging

BMC Bioinformatics. 2008 Feb 4:9:81. doi: 10.1186/1471-2105-9-81.

Abstract

Background: The expansion of automatic imaging technologies has created a need to be able to efficiently compare and review large sets of image data. To enable comparisons of image data between samples we need to define the normal variation within distinct images of the same sample. Even with tightly controlled experimental conditions, protein expression can vary widely between cells, and because of the difficulty in viewing and comparing large image sets this might not be observed. Here we introduce a novel methodology, iCluster, for visualizing, clustering and comparing large sub-cellular localization image sets. For each member of an image set, iCluster generates statistics that have been found to be useful in distinguishing sub-cellular localization. The statistics are mapped into two or three dimensions such as to preserve distances between the statistics vectors. The complete image set is then visualized in two or three dimensions using the coordinates so determined. The result is images that are statistically similar are spatially close in the visualization allowing for easy comparison of images that are similar and distinguishment of dissimilar images into distinct clusters.

Results: The methodology was tested on a set of 502 previously published images containing 10 known sub-cellular localizations. The clustering of images of like type was evaluated both by examining the classes of nearest neighbors to each image and by visual inspection. In three dimensions, 3-neighbor classification accuracy was 83.2%. Visually, each class clustered well with the majority of classes localizing to distinct regions of the space. In two dimensions, 3-neighbor classification accuracy was 68.9%, though visually clustering into classes could be readily discerned. Computational expense was found to be relatively low, and sets of up to 1400 images visualized and interacted with in real time.

Conclusion: The feasibility of automated spatial layout to allow comparison and discrimination of high throughput sub-cellular imaging has been demonstrated. There are many potential applications such as image database curation, semi-automated interactive classification, outlier detection and reference image comparison. By allowing the observation of the full range of imaging data available using modern microscopes these methods will provide an invaluable tool for cell biologists.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence
  • Cluster Analysis
  • Computer Graphics
  • Fluorescent Dyes
  • HeLa Cells
  • Humans
  • Image Processing, Computer-Assisted / methods*
  • Imaging, Three-Dimensional
  • Microscopy, Fluorescence
  • Organelles / ultrastructure*
  • Pattern Recognition, Automated / methods
  • Reference Values
  • Reproducibility of Results
  • User-Computer Interface

Substances

  • Fluorescent Dyes