Background: Medical research produces large multivariable datasets that are difficult to visualise and interpret intuitively. We describe a novel growing cell structure (GCS) technique that compresses multidimensional datasets into two dimensional maps with colour overlays that can be visually interpreted.
Methods: The two-dimensional map is self-discovered from the training set by distribution of cases to different nodes according to similarity between the cases at each node. Nodes are added to the map until there is no further significant reduction in error. The Parzen window method is used to estimate the probability distribution of the training cases, and this probability is converted to posterior class probabilities by use of Bayes' theorem. Classification performance can be assessed by means of receiver operating characteristic (ROC) curves. Colour maps of the values of each input variable at each node are constructed, which illustrate the relation between each input variable and the overall distribution of cases in the network map.
Findings: From a dataset of 11 input variables from 692 fine-needle aspirate samples from breast lesions, a 32-node network produced an area under the ROC curve of 0.96, which was not significantly different from that for logistic regression (0.98, z=1.09, p>0.05). Colour maps of the input variables showed that some variables had discrete distributions over exclusively benign or malignant areas of the network, and were thus discriminant, whereas others, such as foamy macrophages, covered both benign and malignant regions.
Interpretation: This technique produces dimensional compression that allows multidimensional data to be displayed as two-dimensional colour images. This envisioning of information allows the highly developed visuospatial abilities of human observers to perceive subtle inter-relations in the dataset.