In recent years, growing interest in deep learning neural networks has raised a question on how they can be used for effective processing of high-dimensional datasets produced by hyperspectral imaging (HSI). HSI, traditionally viewed as being within the scope of remote sensing, is used in non-invasive substance classification. One of the areas of potential application is forensic science, where substance classification on the scenes is important. An example problem from that area-blood stain classification-is a case study for the evaluation of methods that process hyperspectral data. To investigate the deep learning classification performance for this problem we have performed experiments on a dataset which has not been previously tested using this kind of model. This dataset consists of several images with blood and blood-like substances like ketchup, tomato concentrate, artificial blood, etc. To test both the classic approach to hyperspectral classification and a more realistic application-oriented scenario, we have prepared two different sets of experiments. In the first one, Hyperspectral Transductive Classification (HTC), both a training and a test set come from the same image. In the second one, Hyperspectral Inductive Classification (HIC), a test set is derived from a different image, which is more challenging for classifiers but more useful from the point of view of forensic investigators. We conducted the study using several architectures like 1D, 2D and 3D convolutional neural networks (CNN), a recurrent neural network (RNN) and a multilayer perceptron (MLP). The performance of the models was compared with baseline results of Support Vector Machine (SVM). We have also presented a model evaluation method based on t-SNE and confusion matrix analysis that allows us to detect and eliminate some cases of model undertraining. Our results show that in the transductive case, all models, including the MLP and the SVM, have comparative performance, with no clear advantage of deep learning models. The Overall Accuracy range across all models is 98-100% for the easier image set, and 74-94% for the more difficult one. However, in a more challenging inductive case, selected deep learning architectures offer a significant advantage; their best Overall Accuracy is in the range of 57-71%, improving the baseline set by the non-deep models by up to 9 percentage points. We have presented a detailed analysis of results and a discussion, including a summary of conclusions for each tested architecture. An analysis of per-class errors shows that the score for each class is highly model-dependent. Considering this and the fact that the best performing models come from two different architecture families (3D CNN and RNN), our results suggest that tailoring the deep neural network architecture to hyperspectral data is still an open problem.
Keywords: convolutional neural networks; deep learning; deep neural networks; forensic science; hyperspectral classification; recurrent neural network.