Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 25;13(7):e1005667.
doi: 10.1371/journal.pcbi.1005667. eCollection 2017 Jul.

A mixture of sparse coding models explaining properties of face neurons related to holistic and parts-based processing

Affiliations

A mixture of sparse coding models explaining properties of face neurons related to holistic and parts-based processing

Haruo Hosoya et al. PLoS Comput Biol. .

Abstract

Experimental studies have revealed evidence of both parts-based and holistic representations of objects and faces in the primate visual system. However, it is still a mystery how such seemingly contradictory types of processing can coexist within a single system. Here, we propose a novel theory called mixture of sparse coding models, inspired by the formation of category-specific subregions in the inferotemporal (IT) cortex. We developed a hierarchical network that constructed a mixture of two sparse coding submodels on top of a simple Gabor analysis. The submodels were each trained with face or non-face object images, which resulted in separate representations of facial parts and object parts. Importantly, evoked neural activities were modeled by Bayesian inference, which had a top-down explaining-away effect that enabled recognition of an individual part to depend strongly on the category of the whole input. We show that this explaining-away effect was indeed crucial for the units in the face submodel to exhibit significant selectivity to face images over object images in a similar way to actual face-selective neurons in the macaque IT cortex. Furthermore, the model explained, qualitatively and quantitatively, several tuning properties to facial features found in the middle patch of face processing in IT as documented by Freiwald, Tsao, and Livingstone (2009). These included, in particular, tuning to only a small number of facial features that were often related to geometrically large parts like face outline and hair, preference and anti-preference of extreme facial features (e.g., very large/small inter-eye distance), and reduction of the gain of feature tuning for partial face stimuli compared to whole face stimuli. Thus, we hypothesize that the coding principle of facial features in the middle patch of face processing in the macaque IT cortex may be closely related to mixture of sparse coding models.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1
(A) The architecture of our hierarchical model. It starts with an energy detector bank and proceeds to two sparse coding submodels for faces and objects, which are then combined into a mixture model. Inset: an energy detector model. (B) Cartoon face and boat. Note that the mouth of the face and the base of the boat are the same shapes. (C) Learning scheme. We assume explicit class information, either “face” or “object,” of input images to be given during training, which allows us to use a standard sparse coding learning for each submodel with the corresponding dataset. (D) Inference scheme. For testing response properties, the network first interprets the input separately by the sparse code of each submodel (step 1), then compares the goodnesses of the obtained interpretations as posterior probabilities (step 2), and finally modules multiplicatively the responses in each submodel with the corresponding posterior probability (step 3). Note that the normalization of the probablities in step 2 leads to competition between the submodels in step 3.
Fig 2
Fig 2. The basis representations of three sample model face units.
Each panel depicts the weighting pattern (basis vector) from a face unit to energy detectors by a set of ellipses, where each ellipse corresponds to the energy detector at the indicated x-y position, orientation, and frequency (inverse of the ellipse size); see the top right legend. The color shows the normalized weight value (color bar). Only the maximum positive and the minimum negative weights are shown at each position for readability.
Fig 3
Fig 3. The basis representations of (A) 32 example model face units and (B) 32 example model object units.
Fig 4
Fig 4
(A) The responses of model face units (1–400) and model object units (401–800) to face images (left) and object images (right). The images are sorted by response magnitudes (color bar) for each unit. (B) The responses in the case of removing mixture computation. (C) The distribution of face posterior probabilities for face image inputs and for object image inputs. (D) The distribution of face-selectivity indices for the face units in the case of the mixture model (blue) or the case of the sparse coding model (yellow). The broken lines indicate the values −1/3 and 1/3. (E) The distribution of the number of face images in the top 10 (face or object) images that elicited the largest responses of each face unit.
Fig 5
Fig 5. The tuning curves (red) of the model face units shown in Fig 2 to 19 feature parameters of cartoon faces.
The mean (blue) as well as the maximum and minimum (green) of the tuning curves estimated from surrogate data are also shown (see the section on Simulation details in Methods).
Fig 6
Fig 6
(A) The distribution of the numbers of significantly tuned features per unit, overlaid with a replot of [, Fig. 3c]. (B) The distribution of the numbers of significantly tuned units for each feature parameter, overlaid with a replot (red boxes) of [, Fig. 3d].
Fig 7
Fig 7
(A) All significant tuning curves of all model face units sorted by the peak parameter value. Each tuning curve (row) here was mean-subtracted and divided by the maximum. (B) The distributions of peak parameter values (top) and of trough parameter values (bottom). The overlaid red boxes are replots of [, Fig. 4a] averaged over three monkeys. (C) The distribution of minimal values of the significant tuning curves peaked at +5 and the flipped tuning curves peaked at −5, overlaid with a averaged replot of [, Fig. 4d]. (D) The average of the tuning curves for each minimal value in (C) (with the same color).
Fig 8
Fig 8
(A) Full-variation versus single-variation tuning curves. (B) Full-variation versus partial face tuning curves. (C) Single-variation versus partial face tuning curves. (D) Single-variation versus partial face tuning curves in the case of removing mixture computation. (E) The distributions of face posterior probabilities for the full variation, the single variation, the partial face, and the inverted face conditions. (F) The distribution of the numbers of tuned units per feature for inverted faces (left) and the mean correlation coefficient between the tunings for upright faces and for inverted faces for each facial feature (right).
Fig 9
Fig 9. The distributions of correlation coefficients between 2D tuning functions and additive (blue) or multiplicative predictors (red).
Fig 10
Fig 10. The distributions of (A) the number of tuned features per unit (cf. Fig 6A), (B) the number of tuned units per feature (cf. Fig 6B), and (C) the peak (top) and the trough (bottom) feature values (cf. Fig 7B), in different model variations.
The color of each curve indicates the model variation (see legend).
Fig 11
Fig 11. The distribution of face posterior probabilities for face images (solid curve) or for object images (broken curve) in different model variations (cf. Fig 4C).
The color of each curve indicates the model variation (see legend).
Fig 12
Fig 12. The basis representations of 32 example model units from (A) the face submodel and (B) the object submodel, in the network trained with 300 reduced dimensions.
Fig 13
Fig 13. The graphical diagram for a mixture of sparse coding models.
The variable k is first drawn from its prior, then each variable yh is drawn from a Laplace distribution depending on whether h = k or not, and finally the variable x is generated from a Gaussian distribution depending on yk. (Note that, until k is determined, x is dependent on k and all of y1, y2, …, yK.)

Similar articles

Cited by

References

    1. Tanaka JW, Farah MJ. Parts and wholes in face recognition. The Quarterly journal of experimental psychology. 1993;46A(2):225–245. 10.1080/14640749308401045 - DOI - PubMed
    1. McKone E, Kanwisher N, Duchaine BC. Can generic expertise explain special processing for faces? Trends in cognitive sciences. 2007. January;11(1):8–15. 10.1016/j.tics.2006.11.002 - DOI - PubMed
    1. Tsunoda K, Yamane Y, Nishizaki M, Tanifuji M. Complex objects are represented in macaque inferotemporal cortex by the combination of feature columns. Nature Neuroscience. 2001. August;4(8):832–838. 10.1038/90547 - DOI - PubMed
    1. Freiwald WA, Tsao DY, Livingstone MS. A face feature space in the macaque temporal lobe. Nature Neuroscience. 2009. August;12(9):1187–1196. 10.1038/nn.2363 - DOI - PMC - PubMed
    1. Schiltz C, Rossion B. Faces are represented holistically in the human occipito-temporal cortex. NeuroImage. 2006. September;32(3):1385–1394. 10.1016/j.neuroimage.2006.05.037 - DOI - PubMed

Grants and funding

HH was supported by the New Energy and Industrial Technology Development Organization (P15009; www.nedo.go.jp). AH was supported by Academy of Finland (291538 and 250215; www.aka.fi) and Gatsby Charitable Foundation (GAT3528; www.gatsby.org.uk). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources