Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 7;9(1):3791.
doi: 10.1038/s41598-019-40535-4.

Characterisation of nonlinear receptive fields of visual neurons by convolutional neural network

Affiliations

Characterisation of nonlinear receptive fields of visual neurons by convolutional neural network

Jumpei Ukita et al. Sci Rep. .

Abstract

A comprehensive understanding of the stimulus-response properties of individual neurons is necessary to crack the neural code of sensory cortices. However, a barrier to achieving this goal is the difficulty of analysing the nonlinearity of neuronal responses. Here, by incorporating convolutional neural network (CNN) for encoding models of neurons in the visual cortex, we developed a new method of nonlinear response characterisation, especially nonlinear estimation of receptive fields (RFs), without assumptions regarding the type of nonlinearity. Briefly, after training CNN to predict the visual responses to natural images, we synthesised the RF image such that the image would predictively evoke a maximum response. We first demonstrated the proof-of-principle using a dataset of simulated cells with various types of nonlinearity. We could visualise RFs with various types of nonlinearity, such as shift-invariant RFs or rotation-invariant RFs, suggesting that the method may be applicable to neurons with complex nonlinearities in higher visual areas. Next, we applied the method to a dataset of neurons in mouse V1. We could visualise simple-cell-like or complex-cell-like (shift-invariant) RFs and quantify the degree of shift-invariance. These results suggest that CNN encoding model is useful in nonlinear response analyses of visual neurons and potentially of any sensory neurons.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Scheme of CNN encoding model. The Ca2+ response to a natural image was predicted by convolutional neural network (CNN) consisting of 4 successive convolutional layers, one pooling layer, one fully connected layer, and the output layer (magenta circle). See Methods for details. Briefly, a convolutional layer calculates a 3 × 3 convolution of the previous layer followed by a rectified linear (ReLU) transformation. The pooling layer calculates max-pooling of 2 × 2 regions in the previous layer. The fully connected layer calculates the weighted sum of the previous layer followed by a ReLU transformation. The output layer calculates the weighted sum of the previous layer followed by a sigmoidal transformation. During training, parameters were updated by backpropagation to reduce the mean squared error between the predicted responses and actual responses.
Figure 2
Figure 2
Nonlinear RFs could be estimated by CNN encoding models for simulated simple cells and complex cells. (a,b) Scheme of response generation for simulated simple cells (a) and simulated complex cells (b) (See Methods for details). The Gabor-shaped filters of simulated simple cell A and complex cell B are displayed. (c) Left: comparison of the response predictions among the following encoding models: the L1-regularised linear regression model (Lasso), L2-regularised linear regression model (Ridge), support vector regression model (SVR), hierarchical structural model (HSM), and CNN. Data are presented as the mean ± s.e.m. (N = 30 simulated simple cells and N = 70 simulated complex cells). Right: cumulative distribution of CNN prediction similarity. Simulated cells with a CNN prediction similarity ≤0.3 (indicated as the red arrow) were removed from the following receptive field (RF) analysis. (d,f) Results of iterative CNN RF estimations for simulated simple cell A (d) and complex cell B (f). Only 20 of the 100 generated RF images are shown in these panels. Grids are depicted in cyan. Although the simulated simple cell A had RFs in nearly identical positions, the simulate complex cell B had RFs in shifted positions. (e,g) Linearly estimated RFs (linear RFs) of simulated simple cell A (e) and complex cell B (g), using a regularised pseudoinverse method. (h) Gabor-fitting similarity of CNN RFs, defined as the Pearson correlation coefficient between the CNN RF and fitted Gabor kernel. (i) Maximum similarity between each generator filter and 100 CNN RFs. (j) Maximum similarity between linear RFs and CNN RFs. Similarity was defined as the normalised pixelwise dot product between the linear RF and CNN RF. (k) Relationship of the Gabor orientations between generator filters and CNN RFs. (l) Distribution of complexness. Only cells with a CNN prediction similarity >0.3 were analysed in (h–l) (N = 19 simple cells and N = 47 complex cells).
Figure 3
Figure 3
Nonlinear RFs could be estimated by CNN encoding models for simulated rotation-invariant cells. (a) Scheme of response generation for simulated rotation-invariant cells. The response to a stimulus was defined as the maximum of the output of 36 subunits followed by an additive Gaussian noise. Each subunit, which had a Gabor-shaped filter with different orientations, calculated the dot product between the stimulus image and the filter (See Methods for details). The filters of simulated cell C are displayed in this panel. (b) Cumulative distribution of CNN prediction similarity (N = 10 cells). Simulated cells with a CNN prediction similarity ≤0.3 (indicated as the red arrow) were removed from the following RF analysis. (c) Results of iterative CNN RF estimations for simulated cell C. Only 20 of the 1,000 generated RF images are shown in this panel. RF images had Gabor-like shapes but their orientations were different in different iterations. (d) Maximum similarity between each generator filter and 1,000 CNN RFs. Only cells with a CNN prediction similarity >0.3 were analysed (N = 9 cells).
Figure 4
Figure 4
Prediction of the CNN for V1 neurons. (a) Comparison of the response predictions among various encoding models: the L1-regularised linear regression model (Lasso), L2-regularised linear regression model (Ridge), SVR, HSM, and CNN. Data are presented as the mean ± s.e.m. (N = 2455 neurons). (b) Cumulative distribution of CNN prediction similarity. Neurons with a CNN prediction similarity ≤0.3 (indicated as the red arrow) were removed from the following RF analysis. (c) Distributions of actual responses and predicted responses of the neuron with the best prediction similarity in a plane (top) and the neuron with the median prediction similarity in a plane (bottom). Each dot in the right panel indicates data for each stimulus image. Solid lines in the right panels are the linear least-squares fit lines. Only data for 200 images are shown.
Figure 5
Figure 5
Estimating RFs of V1 neurons from trained CNNs. (a) Linearly estimated RFs (linear RFs) of two representative neurons (neurons D and E), using a regularised pseudoinverse method. (b) RFs estimated from the trained CNNs (CNN RFs) of the two representative neurons. (c) Gabor kernels fitted to CNN RFs of the two representative neurons. (d) Similarity between linear RFs and CNN RFs. Similarity was defined as the normalised pixelwise dot product between the linear RF and the CNN RF. (e) Gabor fitting similarity of CNN RFs, defined as the Pearson correlation coefficient between the CNN RF and the fitted Gabor kernel. Only neurons with a CNN prediction similarity >0.3 were analysed in (d,e) (N = 1160 neurons). (f,g) Results of iterative CNN RF estimations for neuron D (f) and neuron E (g). Only 20 out of the 100 generated RF images are shown in this figure. The number above each RF image indicates the shift pixel distance between the RF image and the top left RF image. The shift distance between the two images was calculated as the maximum distance of pixel shifts with which the zero-mean normalised cross correlation (ZNCC) > 0.95, projected orthogonally to the Gabor orientation. “NA” indicates that the ZNCC was not above 0.95 for any shift. While shift distances were zero or NA for RF images of neuron D, some RF images of neuron E were shifted to another by one pixel.
Figure 6
Figure 6
Schemes of the simple model and complex model. Schemes of the simple model and complex model are illustrated using RFs and actual responses of neuron E. (a) The simple model is a linear predictive model, which predicts the neuronal response as the normalised dot product between the stimulus image and one RF image (RF 4). (b) The complex model predicts the neuronal response as the maximum of the normalised dot products of the stimulus image and several RF images (RF 1–4). Note that the complex model predicted the neuronal response to Stim 2 better than the simple model for this neuron.
Figure 7
Figure 7
Simple cells and complex-like cells. (a) Cumulative distributions of prediction errors of the simple model (green) and the complex model (magenta) for neuron E. Prediction error was defined as the difference between the predicted response and actual response. (b) Relationship of similarities between the simple model and complex model (N = 997 neurons). Neurons with the Gabor fitting similarity ≤0.6, similarity of the simple model <0, or similarity of the complex model <0 were omitted from this analysis. (c) Distribution of complexness. Simple cells (green) and complex-like cells (magenta) were classified with threshold = 0 (black arrow). (d) Proportion of classified cells, simple cells, and complex-like cells among neurons with the CNN response prediction similarity >0.3. Classified cells were neurons with the Gabor fitting similarity >0.6, the response prediction similarity of the simple model >0, and the response prediction similarity of the complex model >0. Simple cells were neurons with complexness ≤0. Complex-like cells were neurons with complexness >0. (e–g) Relationships between complexness and linear (Lasso) prediction similarity (e), similarity between linear RFs and CNN RFs (f), and the nonlinearity index (g). Data of simple cells are presented as the mean ± s.d. (N = 739 neurons, green). Solid lines are the robust fit lines for complex-like cells. Both linear prediction similarity and RF similarity of complex-like cells (magenta) negatively correlated with complexness (r = −0.35, p < 0.001, N = 258 neurons: e and r = −0.29, p < 0.001, N = 258 neurons: f), while the nonlinearity index of complex-like cells positively correlated with complexness (r = 0.34, p < 0.001, N = 258 neurons: g), suggesting that complexness defined here indeed reflected nonlinearity.
Figure 8
Figure 8
Spatial organisations of simple cells and complex-like cells. (a) Left: cortical distribution of complexness for the representative plane. The position of each neuron is represented as the circle annotated by the complexness (cyan to magenta for complex-like cells (complexness >0) and white for simple cells (complexness ≤0)). Right: cortical distribution of simple cells (N = 238 neurons, green) and complex-like cells (N = 70 neurons, magenta) for the representative plane. (b) Relationship between cortical distances and differences of complexness for all simple cells and complex-like cells. (c) Cumulative distributions of the number of simple cell-simple cell pairs (left) or complex-like cell-complex-like cell pairs (right) as a function of the cortical distance, normalised by the area. Dark shadows indicate the range from the first to 99th percentile of 1,000 position-permuted simulations for each plane. The cumulative distributions were both within the first and 99th percentiles of simulations, indicating no distinct spatial arrangements of simple cells or complex-like cells.

Similar articles

Cited by

References

    1. Hubel DH, Wiesel TN. Receptive fields of single neurones in the cat’s striate cortex. J. Physiol. 1959;148:574–591. doi: 10.1113/jphysiol.1959.sp006308. - DOI - PMC - PubMed
    1. Movshon JA, Thompson ID, Tolhurst DJ. Spatial summation in the receptive fields of simple cells in the cat’s striate cortex. J. Physiol. 1978;283:53–77. doi: 10.1113/jphysiol.1978.sp012488. - DOI - PMC - PubMed
    1. Dean AF, Tolhurst DJ. On the distinctness of simple and complex cells in the visual cortex of the cat. J. Physiol. 1983;344:305–325. doi: 10.1113/jphysiol.1983.sp014941. - DOI - PMC - PubMed
    1. Tolhurst DJ, Dean AF. Spatial summation by simple cells in the striate cortex of the cat. Exp. Brain Res. 1987;66:607–620. doi: 10.1007/BF00270694. - DOI - PubMed
    1. DeAngelis GC, Ohzawa I, Freeman RD. Spatiotemporal organization of simple-cell receptive fields in the cat’s striate cortex. II. Linearity of temporal and spatial summation. J. Neurophysiol. 1993;69:1118–1135. - PubMed

Publication types

LinkOut - more resources