Hiding a plane with a pixel: examining shape-bias in CNNs and the benefit of building in biological constraints

Gaurav Malhotra; Benjamin D Evans; Jeffrey S Bowers

doi:10.1016/j.visres.2020.04.013

Hiding a plane with a pixel: examining shape-bias in CNNs and the benefit of building in biological constraints

Vision Res. 2020 Sep:174:57-68. doi: 10.1016/j.visres.2020.04.013. Epub 2020 Jun 28.

Authors

Gaurav Malhotra¹, Benjamin D Evans², Jeffrey S Bowers²

Affiliations

¹ School of Psychological, Science University of Bristol, Bristol BS8 1TU, UK. Electronic address: gaurav.malhotra@bristol.ac.uk.
² School of Psychological, Science University of Bristol, Bristol BS8 1TU, UK.

PMID: 32599343
DOI: 10.1016/j.visres.2020.04.013

Abstract

When deep convolutional neural networks (CNNs) are trained "end-to-end" on raw data, some of the feature detectors they develop in their early layers resemble the representations found in early visual cortex. This result has been used to draw parallels between deep learning systems and human visual perception. In this study, we show that when CNNs are trained end-to-end they learn to classify images based on whatever feature is predictive of a category within the dataset. This can lead to bizarre results where CNNs learn idiosyncratic features such as high-frequency noise-like masks. In the extreme case, our results demonstrate image categorisation on the basis of a single pixel. Such features are extremely unlikely to play any role in human object recognition, where experiments have repeatedly shown a strong preference for shape. Through a series of empirical studies with standard high-performance CNNs, we show that these networks do not develop a shape-bias merely through regularisation methods or more ecologically plausible training regimes. These results raise doubts over the assumption that simply learning end-to-end in standard CNNs leads to the emergence of similar representations to the human visual system. In the second part of the paper, we show that CNNs are less reliant on these idiosyncratic features when we forgo end-to-end learning and introduce hard-wired Gabor filters designed to mimic early visual processing in V1.

Keywords: Biological constraints; Convolutional neural networks; End-to-end learning; Gabor filters; Image classification; Object recognition; Shape-bias; V1.

MeSH terms

Humans
Neural Networks, Computer*
Visual Perception*