Parsing the visual scene into objects is paramount to survival. Yet, how this is accomplished by the nervous system remains largely unknown, even in the comparatively well understood visual system. It is especially unclear how detailed peripheral signal representations are transformed into the object-oriented representations that are independent of object position and are provided by the final stages of visual processing. This perspective discusses advances in computational algorithms for fitting large-scale models that make it possible to reconstruct the intermediate steps of visual processing based on neural responses to natural stimuli. In particular, it is now possible to characterize how different types of position invariance, such as local (also known as phase invariance) and more global, are interleaved with nonlinear operations to allow for coding of curved contours. Neurons in the mid-level visual area V4 exhibit selectivity to pairs of even- and odd-symmetric profiles along curved contours. Such pairing is reminiscent of the response properties of complex cells in the primary visual cortex (V1) and suggests specific ways in which V1 signals are transformed within subsequent visual cortical areas. These examples illustrate that large-scale models fitted to neural responses to natural stimuli can provide generative models of successive stages of sensory processing.
Keywords: Convolutional Neural Networks (CNN); area V4; auditory system; curvature; object recognition; phase invariance; quadrature model; visual system.