Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 16;12(1):7328.
doi: 10.1038/s41467-021-27606-9.

Face detection in untrained deep neural networks

Affiliations

Face detection in untrained deep neural networks

Seungdae Baek et al. Nat Commun. .

Abstract

Face-selective neurons are observed in the primate visual pathway and are considered as the basis of face detection in the brain. However, it has been debated as to whether this neuronal selectivity can arise innately or whether it requires training from visual experience. Here, using a hierarchical deep neural network model of the ventral visual stream, we suggest a mechanism in which face-selectivity arises in the complete absence of training. We found that units selective to faces emerge robustly in randomly initialized networks and that these units reproduce many characteristics observed in monkeys. This innate selectivity also enables the untrained network to perform face-detection tasks. Intriguingly, we observed that units selective to various non-face objects can also arise innately in untrained networks. Our results imply that the random feedforward connections in early, untrained deep neural networks may be sufficient for initializing primitive visual selectivity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Spontaneous emergence of face-selectivity in untrained networks.
a Face-selective neurons and their response observed in monkey experiments. The response was normalized to the maximum value as 1. The face image shown is not the original stimulus set due to copyright. The image shown is available at [https://www.shutterstock.com] (see Methods for details). b The architecture of the untrained AlexNet. The untrained AlexNet was devised using a random initialization method, for which the values in each weight kernel were randomly sampled from a Gaussian distribution. c A stimulus set was designed to control the degree of intra-class image similarity. Stimulus images were selected and modified from a publicly available dataset that has been used in human fMRI study. The original images are available at [http://vpnl.stanford.edu/fLoc/]. d Responses of individual face-selective units in the untrained AlexNet (P < 0.001, two-sided rank-sum test, uncorrected). e The number of face-selective units in each convolutional layer in untrained networks (n = 100). f Face-selectivity index (FSI) of face-selective neurons in the primate IT (n = 158), and face units in each convolutional layer in the untrained AlexNet. The control FSI was measured according to the shuffled responses of face-selective units in the untrained network. g (Left) Examples of texform and scrambled face images. (Right) Responses of face-selective units to the original face (n = 200), the scrambled face (n = 200) and texform face images (n = 100). h Responses of face-selective units to four different sets of novel face images: (1) 50 face images from our original dataset (images not used for finding face-selective units), (2) 16 images used in Tsao et al.,, (3) 50 images used in Cao et al. in color and gray scale, and (4) 50 face images artificially generated by the FaceGen simulator (singular inversions) in color and gray scale. i The number of face-selective units, where the weight variation was changed from 5 to 200% of the original value using two different initialization methods with a Gaussian (red) and a uniform distribution (blue). j FSI of face-selective units across changes in the weight. Dashed lines indicate the mean and shaded areas indicate the standard deviation of 30 random networks. All box plots indicate the inter-quartile range (IQR between Q1 and Q3) of the dataset, the horizontal line depicts the median and the whiskers correspond to the rest of the distribution (Q1 − 1.5*IQR, Q3 + 1.5*IQR).
Fig. 2
Fig. 2. Preferred feature images of face-selective units in untrained networks.
a Measurements of preferred feature images (PFI) of target units in Conv5 from the reverse-correlation analysis. Bright and dark 2D Gaussian filters were generated at a random position as an input stimulus set. The PFI was obtained as the summation of stimuli weighted by the corresponding responses. The initial preferred feature image was calculated from the local Gaussian stimulus set by the classical reverse-correlation method. Then, a new stimulus set was generated as the summation of the obtained PFI and local Gaussian stimuli, with the second preferred feature image then obtained from a new stimulus set. This procedure was repeated to obtain the preferred feature image. b Schematics of the process used to achieve a preferred feature image (PFI) using a generative neural network (GAN) and a genetic algorithm (X-Dream). Synthesized images are generated by the GAN with image codes and are fed into an untrained network as input. The genetic algorithm finds a new image code that maximizes the response of the target unit. The PFI of a target unit is achieved after 100 iterations of this procedure. c The obtained preferred feature images, using the reverse-correlation method and X-Dream, of the face-selective unit, selective units to non-face class (flower), and units selective to none of the class. d Illustration of the face-configuration index (FCI) of a face unit’s PFI. The FCI was defined as the pixel-wise correlation between the original face stimuli and the generated PFIs. e FCI of PFI, using the reverse correlation method, of units selective to each class (nFace = 465, nHand = 7, nHorn = 772, nFlower = 107, nChair = 63). f FCI of PFI, using X-Dream, of the same units as the units used in (e). All box plots indicate the inter-quartile range (IQR between Q1 and Q3) of the dataset, the horizontal line depicts the median and the whiskers correspond to the rest of the distribution (Q1 − 1.5*IQR, Q3 + 1.5*IQR). The face images shown in panels (d)–(f) are selected examples from the publicly available dataset. The original images are available at [http://vpnl.stanford.edu/fLoc/].
Fig. 3
Fig. 3. Detection of face images using the response of face units in untrained networks.
a Design of the face detection task and SVM classifier using the responses of the untrained AlexNet. During this task, face or non-face images were randomly presented to the networks and the observed response of the final layer was used to train a support vector machine (SVM) to classify whether the given image was a face or not. Among 60 images from each class (face, hand, horn, flower, chair, and scrambled face) that were not used for face unit selection, 40 images were randomly sampled for the training of the SVM, and the other 20 images were used for testing. The images shown are selected examples from the publicly available dataset. The original images are available at [http://vpnl.stanford.edu/fLoc/]. b Performance on the face detection task using a single unit randomly sampled from face-selective units (n = 465) and units without selective responses to any image classes (n = 7776). The chance level was measured by the shuffled responses of face-selective units in the untrained network. The error bar indicates the standard deviation of each unit. Each bar indicates the mean and the error bar indicates the standard deviation of performance of each unit. c Performance of the face detection task using face-selective units and non-selective units when varying the number of units from 1 to 456. The dashed line indicates the detection performance using all units in Conv5 (n = 43,264). Each line indicates the mean and the shaded area indicates the standard deviation for 100 repeated trials of the random sampling of units. d Performance on the face detection task using face-selective units (n = 465) and then using all units in Conv5 (n = 43,264). Each bar indicates the mean and the error bar indicates the standard deviation for 100 repeated trials of the random sampling of units.
Fig. 4
Fig. 4. Effect of training on face-selectivity in untrained networks.
a Three different datasets modified from publicly available ImageNet were used for the training of the network for image classification: (1) face-reduced ImageNet, (2) original ImageNet, and (3) ImageNet with added face images. For copyright reasons, the face image shown here is not the actual image used in the experiments. The original images are replaced with images with similar contents for display purposes. The original images are available at [https://www.image-net.org/download]. Images shown are available at [https://www.shutterstock.com, http://vpnl.stanford.edu/fLoc/]. See Methods for details). b Face-selectivity index of face-selective units in untrained networks and in networks trained with the three datasets (nUntrained = 4267, nReduced = 2452, nOriginal = 3561, nFace = 3585). c The number of face-selective units in untrained networks and in networks trained with the three datasets (nNet = 10). d Face detection performance of untrained networks and of networks trained with three different datasets (ntrial = 1000). e The obtained preferred feature images (PFI), using the reverse-correlation method of the face-selective unit on each network. All box plots indicate the inter-quartile range (IQR between Q1 and Q3) of the dataset, the horizontal line depicts the median and the whiskers correspond to the rest of the distribution (Q1 − 1.5*IQR, Q3 + 1.5*IQR).
Fig. 5
Fig. 5. ImageNet category-selective units in untrained networks.
a The responses of units in untrained networks to the images of 1000 ImageNet classes and to face images (VGGFace2). b Average tuning curve of gazania selective units. (Inset) Responses of gazania-selective units to the original gazania (n = 100), the scrambled gazania (n = 100) and texform gazania images (n = 100). c The number of selective units for 39 classes in which selective units are observed. The error bar indicates the standard deviation of 50 random networks. d Sample preferred feature images achieved by reverse-correlation analysis and stimulus images (inset). e Visualization of the PCA (principal component analysis) analysis results (only two principal components (PC) are shown) using the Conv5 unit responses to each class in untrained networks. The analysis was performed using 3999 principal components, and the top 140 ± 32 components contained 75% of the variance. f The silhouette index of the Conv5 unit responses was measured using all principal components to estimate the consistency of data clustering. Each dot indicates the mean and the error bar indicates the standard deviation of 50 simulations of randomly initialized networks. The error bar indicates the standard deviation of 50 simulations of randomly initialized networks. g Correlation between the silhouette index and the number of selective units observed (Pearson correlation). Each dot indicates the mean and the error bar indicates the standard deviation of 50 random networks. All box plots indicate the inter-quartile range (IQR between Q1 and Q3) of the dataset, the horizontal line depicts the median and the whiskers correspond to the rest of the distribution (Q1 − 1.5*IQR, Q3 + 1.5*IQR). For copyright reasons, the images in panels (a) and (d) are not the actual images used in the experiments. The original images are replaced with images with similar contents for display purposes. The original images are available at [https://www.image-net.org/download, https://arxiv.org/abs/1710.08092]. Images shown are available at [https://www.shutterstock.com] (see Methods for details).

Similar articles

Cited by

References

    1. Desimone R. Face-selective cells in the temporal cortex of monkeys. J. Cogn. Neurosci. 1991;3:1–8. - PubMed
    1. Tsao DY, Moeller S, Freiwald WA. Comparing face patch systems in macaques and humans. Proc. Natl Acad. Sci. USA. 2008;105:19514–19519. - PMC - PubMed
    1. Afraz A, Boyden ES, DiCarlo JJ. Optogenetic and pharmacological suppression of spatial clusters of face neurons reveal their causal role in face gender discrimination. Proc. Natl Acad. Sci. USA. 2015;112:6730–6735. - PMC - PubMed
    1. Sadagopan S, Zarco W, Freiwald WA. A causal relationship between face-patch activity and face-detection behavior. Elife. 2017;6:1–14. - PMC - PubMed
    1. Tsao DY, Freiwald WA, Tootell RBH, Livingstone MS. A cortical region consisting entirely of face-selective cells. Science. 2006;311:670–674. - PMC - PubMed

Publication types