Closing the gap between single-unit and neural population codes: Insights from deep learning in face recognition

Connor J Parde; Y Ivette Colón; Matthew Q Hill; Carlos D Castillo; Prithviraj Dhar; Alice J O'Toole

doi:10.1167/jov.21.8.15

Closing the gap between single-unit and neural population codes: Insights from deep learning in face recognition

J Vis. 2021 Aug 2;21(8):15. doi: 10.1167/jov.21.8.15.

Authors

Connor J Parde^{1

2}, Y Ivette Colón^{1

3}, Matthew Q Hill^{1

4}, Carlos D Castillo^{5

6}, Prithviraj Dhar^{5

7}, Alice J O'Toole^{1

8}

Affiliations

¹ School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA.
² connor.parde@utdallas.edu.
³ ycolon@wisc.edu.
⁴ matthew.hill@utdallas.edu.
⁵ University of Maryland Institute of Advanced Computer Studies, University of Maryland, College Park, MD, USA.
⁶ carlos.d.castillo@gmail.com.
⁷ pdhar@cs.umd.edu.
⁸ otoole@utdallas.edu.

Abstract

Single-unit responses and population codes differ in the "read-out" information they provide about high-level visual representations. Diverging local and global read-outs can be difficult to reconcile with in vivo methods. To bridge this gap, we studied the relationship between single-unit and ensemble codes for identity, gender, and viewpoint, using a deep convolutional neural network (DCNN) trained for face recognition. Analogous to the primate visual system, DCNNs develop representations that generalize over image variation, while retaining subject (e.g., gender) and image (e.g., viewpoint) information. At the unit level, we measured the number of single units needed to predict attributes (identity, gender, viewpoint) and the predictive value of individual units for each attribute. Identification was remarkably accurate using random samples of only 3% of the network's output units, and all units had substantial identity-predicting power. Cross-unit responses were minimally correlated, indicating that single units code non-redundant identity cues. Gender and viewpoint classification required large-scale pooling of units-individual units had weak predictive power. At the ensemble level, principal component analysis of face representations showed that identity, gender, and viewpoint separated into high-dimensional subspaces, ordered by explained variance. Unit-based directions in the representational space were compared with the directions associated with the attributes. Identity, gender, and viewpoint contributed to all individual unit responses, undercutting a neural tuning analogy. Instead, single-unit responses carry superimposed, distributed codes for face identity, gender, and viewpoint. This undermines confidence in the interpretation of neural representations from unit response profiles for both DCNNs and, by analogy, high-level vision.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Animals
Deep Learning*
Face
Facial Recognition*
Neural Networks, Computer

Grants and funding

R01 EY029692/EY/NEI NIH HHS/United States