Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 24;44(17):e0296232024.
doi: 10.1523/JNEUROSCI.0296-23.2024.

A Unifying Model for Discordant and Concordant Results in Human Neuroimaging Studies of Facial Viewpoint Selectivity

Affiliations

A Unifying Model for Discordant and Concordant Results in Human Neuroimaging Studies of Facial Viewpoint Selectivity

Cambria Revsine et al. J Neurosci. .

Abstract

Recognizing faces regardless of their viewpoint is critical for social interactions. Traditional theories hold that view-selective early visual representations gradually become tolerant to viewpoint changes along the ventral visual hierarchy. Newer theories, based on single-neuron monkey electrophysiological recordings, suggest a three-stage architecture including an intermediate face-selective patch abruptly achieving invariance to mirror-symmetric face views. Human studies combining neuroimaging and multivariate pattern analysis (MVPA) have provided convergent evidence of view selectivity in early visual areas. However, contradictory conclusions have been reached concerning the existence in humans of a mirror-symmetric representation like that observed in macaques. We believe these contradictions arise from low-level stimulus confounds and data analysis choices. To probe for low-level confounds, we analyzed images from two face databases. Analyses of image luminance and contrast revealed biases across face views described by even polynomials-i.e., mirror-symmetric. To explain major trends across neuroimaging studies, we constructed a network model incorporating three constraints: cortical magnification, convergent feedforward projections, and interhemispheric connections. Given the identified low-level biases, we show that a gradual increase of interhemispheric connections across network-layers is sufficient to replicate view-tuning in early processing stages and mirror-symmetry in later stages. Data analysis decisions-pattern dissimilarity measure and data recentering-accounted for the inconsistent observation of mirror-symmetry across prior studies. Pattern analyses of human fMRI data (of either sex) revealed biases compatible with our model. The model provides a unifying explanation of MVPA studies of viewpoint selectivity and suggests observations of mirror-symmetry originate from ineffectively normalized signal imbalances across different face views.

Keywords: MVPA; RSA; fMRI; face recognition; symmetry; viewpoint.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1.
Figure 1.
Commonalities and inconsistencies across fMRI-MVPA studies investigating viewpoint representations in humans. Four regions of interest are depicted on a sagittal view of the brain. In EVC (shown in green), 5/5 studies reported marked view-tuning, as depicted by the dissimilarity matrix shown in red (viewpoint model) and the unimodal neural tuning function shown immediately above. In OFA (in orange), 5/5 studies reported evidence of view-tuning. One study, however, reported additional evidence of some degree of mirror-symmetry, represented by the blue dissimilarity matrix (symmetry model) and bimodal tuning function immediately above. In pSTS (in purple), 5/5 studies reported evidence of view-tuning, with 2/5 observing some degree of mirror-symmetry. Finally, in the FFA (in yellow), 6/6 studies reported evidence of view-tuning, while 4/6 of these studies also reported evidence of mirror-symmetry, ranging from weak to strong. In sum, while marked view-tuning was consistently observed in posterior brain regions, mirror-symmetry was inconsistently observed, albeit with increasing frequency, in increasingly anterior areas along the ventral stream. OFA, occipital face area; pSTS, posterior superior temporal sulcus; FFA, fusiform face area.
Figure 2.
Figure 2.
Proposed account of commonalities and inconsistencies across fMRI-MVPA studies. a, Schematic of visual hemifields and their mapping onto cerebral hemispheres, axial view. Locations in left and right visual fields map onto primary visual cortex of the contralateral cerebral hemisphere. A known property of visual cortex is increasingly bilateral hemifield representations, due to interhemispheric connections, as one proceeds along the visual hierarchy. This property is central to the model proposed here. b, (Low-level image properties) + (interhemispheric crossings) ≈ RSA results. Faces in different views exhibit different distributions of luminance and contrast. These properties have been found to exhibit symmetric distributions about the frontal view. Full circles (top row) indicate the mean luminance of the image of the face view shown immediately above. Anterior brain areas, which integrate input from both hemifields, are expected to exhibit quadratic (i.e., symmetric) biases of the form illustrated in the bar plot shown at the top right. In contrast, responses across views for half-images (bottom row) are expected to exhibit antisymmetric biases. Black and white circles indicate the rough luminance distribution of each half-image for each face view—note the dark hair and bright skin. For right V1, this would imply a roughly linear (i.e., antisymmetric) trend if responses were proportional to the luminance of the left side of the stimulus (see bar plot at bottom right). If RSA outcomes reflect such trends for luminance and contrast, as we propose here, then dissimilarity matrices at earlier processing stages would exhibit marked view-tuning regardless of pattern dissimilarity measure. In turn, representations in later processing stages would exhibit mirror-symmetry either with the Euclidean distance or angular distances (e.g., correlation distance) only if the data are mean-centered across conditions prior to RSA. Instead, for RSA with the correlation distance, a viewpoint-specific representation would be expected throughout cortex, as shown by the dissimilarity matrices in the rightmost column.
Figure 3.
Figure 3.
Cocktail-mean subtraction changes correlations among brain patterns. a, fMRI spatial activity pattern for a visual stimulus in an ROI. The spatial pattern is also represented as a vector by concatenating the response magnitude in each voxel. Regression coefficients obtained from a GLM (see Materials and Methods) for one example experimental condition are shown to the right. b, Regression coefficients for nine voxels and five experimental conditions are shown arranged as a matrix. To the right, the row-wise mean (i.e., cocktail mean) is shown in red for each voxel. Demeaning the data across voxels (column-wise), shown in green, is not the problematic form of demeaning relevant here. Demeaning the data across conditions (i.e., cocktail demeaning), shown in red, is the problematic form of data recentering relevant here. c, Representation in N-dimensional space of fMRI pattern–vectors for two conditions (c1 and c2). The coordinates of the origin in this space are specified by the zero vector. The Euclidean distance, d, between the endpoint of these vectors is shown with a gray line, and the angle θ subtended between c1 and c2 is shown in blue. d, Representation of the same experimental conditions after cocktail demeaning, which shifts the origin of the coordinate system. Critically, the angle between c1 and c2 has markedly changed after cocktail demeaning (compare θ in c, d) and hence their correlation, since the latter is the cosine of the angle between c1 and c2 after zero-centering each vector (e.g., by row-wise demeaning, as shown in panel b). In contrast, note that the Euclidean distance between c1 and c2 remained unchanged after cocktail demeaning.
Figure 4.
Figure 4.
Feedforward, randomly-connected, two-hemisphere network architecture. a, Probability distributions used to specify image locations sampled by layer 1 units. Top row, Distribution used to model CM of central image locations in V1. Middle row, Distributions used to model left-hemifield (LH) and right-hemifield (RH) representations. Bottom row, Product of CM with LH and RH distributions. These distributions served to specify, by random sampling, image locations providing input to units in each hemisphere of layer 1. b, Full network, consisting of eight layers: layer 1 (4,096 units) and layers 2–8 (each 1,024 units). Feedforward connections between units in consecutive layers define this architecture. Input to the left hemisphere of layer 1 (shown in purple) originates from RFs located almost exclusively on the right side of each image. The opposite is observed for the right hemisphere of layer 1 (in yellow). Ipsilateral projections within each network hemisphere are indicated by solid lines. Contralateral projections are indicated by dashed lines. The probability of a contralateral projection increases in steps of 0.08, beginning at 0.02 between layers 1 and 2, and reaching 0.5 between layers 7 and 8.
Figure 5.
Figure 5.
Distribution of mean luminance and contrast as a function of viewpoint for two face databases. a, Top row, Example face identity shown in five orientations (−90°, −45°, 0°, 45°, 90°). Second row, Images above shown after pooling local orientation and frequency filters from the S1 layer of HMAX model. b, c, Median and interquartile range are shown for mean luminance (mean pixel value) or contrast (pixel variance) of face identities, always as a function of viewpoint. Bar plots in light gray correspond to pixel-level representation. d, e, Plots in black correspond to S1 level representation. Best fitting second-order polynomial is shown in red only if either the linear or quadratic regression coefficients are significantly different from zero at the population level and their associated absolute values also significantly different. p.v., pixel value; f.o., filter output. Please note that all face images shown throughout this paper were computer generated and used for illustration purposes only, they are not instances of photographs from the analyzed databases.
Figure 6.
Figure 6.
Distribution of mean luminance and contrast as a function of viewpoint for half-images for two face databases. Layout as in Figure 5. Here, however, only the left half of each image was analyzed. In contrast to Figure 5, where stronger quadratic than linear trends of mean luminance and contrast were usually observed across face views, linear trends proved dominant here regardless of database and representational format. Best fitting second-order polynomials are shown in red following criteria in Figure 5.
Figure 7.
Figure 7.
Impact of CM and network density on layer 1 activation profiles. a, b, Results are organized according to portion of the image (whole, or left-half) or network hemispheres analyzed (both, only right). Image- and network-level analyses are shown, respectively, on the left and right column. Analyses computed on pixel- and S1-level representations are shown, respectively, on the top and bottom row. a, Top, left; Median image contrast across face identities for each viewpoint. Error bars indicate interquartile ranges. Best fitting second-order polynomials are overlayed on each plot. Top, right; Median variance of activation patterns associated with each face identity in layer 1. Unlike image-level analyses shown to the left, network analyses shown to the right consider CM of central image locations. b, Median contrast across face identities as a function of viewpoint for half-images. Panel is organized as panel a. Note differences in form and direction of trends when contrasting image- and network-level analyses. c, Difference in partial R2 of symmetric (quadratic and quartic) and antisymmetric (linear and cubic) trends for layer 1 activation patterns as a function of network density (x-axis) and CM (green, CM; yellow, no CM). As in panel a, activation profiles for each face identity were formed by concatenating the variance of activation patterns for each face view. Shaded areas indicate 95% confidence intervals. Positive values indicate stronger symmetric than antisymmetric trends. Note consistency in the direction of dominant trends regardless of database, CM, and number of hemispheres. p.v., pixel value; f.o., filter output; n.u., network unit.
Figure 8.
Figure 8.
RSA results across network layers for density level = 16. a, Each subplot summarizes RSA outcomes for network layers 1–8 for a single combination of image database, model variant, and RSA type. Rows indicate image database (KDEF, RaFD) and model variant (pixel, S1) combinations. Columns indicate RSA type: correlation distance (RSAcorr), correlation distance on demeaned data (RSAcorrDem), and Euclidean distance (RSAEuc). For each network layer (x-axis), median correlations are shown between simulated empirical dissimilarity matrices and the view-tuned (in red) and the mirror-symmetric models (in blue). Shaded areas indicate interquartile ranges. Note consistently view-tuned representations across layers for RSAcorr (except for RaFD-S1 at layer 1). In contrast, however, for RSAcorrDem and RSAEuc a relative decrease of view-tuning and increase of mirror-symmetry is observed along the hierarchy. b, Average correlation difference between the symmetry and viewpoint models (y-axis), in early (1 and 2) and late (7 and 8) network layers. “V” and “S” indicate that correlations with the viewpoint and symmetry models, respectively, are significantly above zero. Note for RSAcorrDem and RSAEuc a clear shift toward mirror-symmetry in later network layers. Gray lines indicate statistically significant mean increments or decrements when comparing early and late network layers (sign-permutation tests; all p < 0.001).
Figure 9.
Figure 9.
RSA results for multiple network densities. Layout as in Figure 8. Here, however, y-axis denotes density level of the analyzed patterns. Each plot color codes the paired t-statistic comparing correlation coefficients of the simulated empirical DSMs with the viewpoint and symmetry models. Areas in red indicate consistently higher correlations with the viewpoint model. Areas in blue indicate consistently higher correlations with the symmetry model. Results are broadly concordant with those shown in Figure 8 for a network density of 16 (see text for one exception). Dark gray and light gray triangles, respectively, indicate for the lowest and highest network densities (d = 1 and 32) the point at which a shift occurs from a predominantly view-tuned to a predominantly mirror-symmetric representation. Triangles are shown when the mean difference in the point of zero-crossing for the two network densities was significantly larger than zero (sign-permutation tests).
Figure 10.
Figure 10.
Empirical evaluation of model predictions for face stimuli in multiple processing stages along the posterior-anterior axis of the visual hierarchy. The outcome of RSA analyses based on the response patterns derived from our model for face images in different viewpoints (−90°, −45°, 0°, 45°, 90°) is shown on the left side of this figure (see text for details). Each row shows the mean correlation with the mirror-symmetric and viewpoint model templates in layers 1, 3, 5, and 7 according to the three approaches to RSA probed in this paper, namely, using the correlation distance as measure of pattern dissimilarity (RSAcorr), the correlation distance on demeaned data (RSAcorrDem), or the Euclidean distance (RSAEuc). On the right side of the figure, the outcome of identical RSA analyses of empirical fMRI data for face stimuli in the same five rotational angles are shown for EVC, the LO portion of the LOC, the OFA, as well as the FFA. Error bars correspond to standard errors of means. Stars indicate values significantly different from zero. In turn, bars with stars indicate that the difference between the two models is statistically significant. Significance level for all tests is α = 0.05. See Materials and Methods and Table 4 for details.

Update of

Similar articles

Cited by

References

    1. Afraz A, Boyden ES, DiCarlo JJ (2015) Optogenetic and pharmacological suppression of spatial clusters of face neurons reveal their causal role in face gender discrimination. Proc Natl Acad Sci U S A 112:6730–6735. 10.1073/pnas.1423328112 - DOI - PMC - PubMed
    1. Amano K, Wandell BA, Dumoulin SO (2009) Visual field maps, population receptive field sizes, and visual field coverage in the human MT+ complex. J Neurophysiol 102:2704–2718. 10.1152/jn.00102.2009 - DOI - PMC - PubMed
    1. Andrews TJ, Watson DM, Rice GE, Hartley T (2015) Low-level properties of natural images predict topographic patterns of neural response in the ventral visual pathway. J Vis 15:3. 10.1167/15.7.3 - DOI - PMC - PubMed
    1. Anzellotti S, Fairhall SL, Caramazza A (2014) Decoding representations of face identity that are tolerant to rotation. Cereb Cortex 24:1988–1995. 10.1093/cercor/bht046 - DOI - PubMed
    1. Axelrod V, Yovel G (2012) Hierarchical processing of face viewpoint in human visual cortex. J Neurosci 32:2442–2452. 10.1523/JNEUROSCI.4770-11.2012 - DOI - PMC - PubMed

Publication types

LinkOut - more resources