Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug 10;36(32):8399-415.
doi: 10.1523/JNEUROSCI.0396-16.2016.

3D Visual Response Properties of MSTd Emerge from an Efficient, Sparse Population Code

Affiliations
Free PMC article

3D Visual Response Properties of MSTd Emerge from an Efficient, Sparse Population Code

Michael Beyeler et al. J Neurosci. .
Free PMC article

Abstract

Neurons in the dorsal subregion of the medial superior temporal (MSTd) area of the macaque respond to large, complex patterns of retinal flow, implying a role in the analysis of self-motion. Some neurons are selective for the expanding radial motion that occurs as an observer moves through the environment ("heading"), and computational models can account for this finding. However, ample evidence suggests that MSTd neurons exhibit a continuum of visual response selectivity to large-field motion stimuli. Furthermore, the underlying computational principles by which these response properties are derived remain poorly understood. Here we describe a computational model of macaque MSTd based on the hypothesis that neurons in MSTd efficiently encode the continuum of large-field retinal flow patterns on the basis of inputs received from neurons in MT with receptive fields that resemble basis vectors recovered with non-negative matrix factorization. These assumptions are sufficient to quantitatively simulate neurophysiological response properties of MSTd cells, such as 3D translation and rotation selectivity, suggesting that these properties might simply be a byproduct of MSTd neurons performing dimensionality reduction on their inputs. At the population level, model MSTd accurately predicts eye velocity and heading using a sparse distributed code, consistent with the idea that biological MSTd might be well equipped to efficiently encode various self-motion variables. The present work aims to add some structure to the often contradictory findings about macaque MSTd, and offers a biologically plausible account of a wide range of visual response properties ranging from single-unit selectivity to population statistics.

Significance statement: Using a dimensionality reduction technique known as non-negative matrix factorization, we found that a variety of medial superior temporal (MSTd) neural response properties could be derived from MT-like input features. The responses that emerge from this technique, such as 3D translation and rotation selectivity, spiral tuning, and heading selectivity, can account for a number of empirical results. These findings (1) provide a further step toward a scientific understanding of the often nonintuitive response properties of MSTd neurons; (2) suggest that response properties, such as complex motion tuning and heading selectivity, might simply be a byproduct of MSTd neurons performing dimensionality reduction on their inputs; and (3) imply that motion perception in the cortex is consistent with ideas from the efficient-coding and free-energy principles.

Keywords: MSTd; heading selectivity; non-negative matrix factorization; optic flow; visual motion processing.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overall model architecture. A number S of 2D flow fields depicting observer translations and rotations in a 3D world were processed by an array of F MT-like motion sensors, each tuned to a specific direction and speed of motion. MT-like activity values were then arranged into the columns of a data matrix, V, which served as input for NMF. The output of NMF was two reduced-rank matrices, W (containing B non-negative basis vectors) and H (containing hidden coefficients). Columns of W (basis vectors) were then interpreted as weight vectors of MSTd-like model units.
Figure 2.
Figure 2.
Example flow fields generated with the motion field model (Longuet-Higgins and Prazdny, 1980; figure is a derivative of FigExamplesOfOpticFlow by Raudies (2013), used under CC BY). A, B, We sampled flow fields that mimic natural viewing conditions during upright locomotion toward a back plane (A) and over a ground plane (B), generated from a pinhole camera with image plane x, y ∈ {−1 cm, 1 cm} and focal length f = 1 cm. Gray arrows indicate the axes of the 3D coordinate system, and bold black arrows indicate self-movement (translation, straight arrows; rotation, curved arrows). Crosses indicate the direction of self-movement (i.e., heading), and squares indicate the COM. In the absence of rotation, the COM indicates heading (B). A, Example of forward/sideward translation (vx = 0.45 m/s, vz = 0.89 m/s) toward a back plane located at a distance of Z(x, y) = 10 m. B, Example of curvilinear motion (vx = vz = 0.71 m/s) and yaw rotation (ωy = 3°/s) over a ground plane located at distance Z(y) = df/(ycos(α) + fsin(α)), where d = −10 m and α = −30°.
Figure 3.
Figure 3.
Applying NMF to MT-like patterns of activity led to a sparse parts-based representation of retinal flow, similar to the parts-based representation of faces in Lee and Seung (1999), their Figure 1. Using NMF, a particular instance of an input stimulus (corresponding to the ith column of data matrix V), shown on the left, can be approximated by a linear superposition of basis vectors (i.e., the columns of W), here visualized as basis flow fields using population vector decoding (Georgopoulos et al., 1982) in a 8 × 8 montage. The coefficients of the linear superposition (corresponding to the ith column of H) are shown next to W as an 8 × 8 matrix, where darker colors correspond to higher activation values. The reconstructed stimulus (with a reconstruction error of 0.0824), is shown on the right. The input stimulus was made of both translational and rotational components (v⃗ = [0.231, −0.170, 0.958]t m/s and ω⃗ = [0.481, 0.117, 9.99]t °/s) toward a back plane located at d = 2 m from the observer, with heading indicated by a small cross, and the COM indicated by a small square.
Figure 4.
Figure 4.
Schematic of the 26 translational and rotational directions used to test MSTd-like model units (modified with permission from Takahashi et al., 2007, their Figure 2). A, Illustration of the 26 movement vectors, spaced 45° apart on the sphere, in both azimuth and elevation. B, Top view: definition of azimuth angles. C, Side view: definition of elevation angles. Straight arrows illustrate the direction of movement in the translation protocol, whereas curved arrows indicate the direction of rotation (according to the right-hand rule) about each of the movement vectors.
Figure 5.
Figure 5.
A–D, Example of 3D direction tuning for an MSTd neuron (rotation, A; translation, C), reprinted with permission from Takahashi et al. (2007), their Fig. 4, and a similarly tuned MSTd-like model unit (rotation, B; translation, D). Color contour maps show the mean firing rate or model activation as a function of azimuth and elevation angles. Each contour map shows the Lambert cylindrical equal-area projection of the original data, where the abscissa corresponds to the azimuth angle and the ordinate is a sinusoidally transformed version of elevation angle (Snyder, 1987).
Figure 6.
Figure 6.
A–D, Distribution of 3D direction preferences of MSTd neurons (rotation, A; translation, C), reprinted with permission from Takahashi et al. (2007), their Figure 6, and the population of MSTd-like model units (rotation, B; translation, D). Each data point in the scatter plot corresponds to the preferred azimuth (abscissa) and elevation (ordinate) of a single neuron or model unit. Histograms along the top and right sides of each scatter plot show the marginal distributions. Also shown are 2D projections (front view, side view, and top view) of unit-length 3D preferred direction vectors (each radial line represents one neuron or model unit). The neuron in Figure 5 is represented as open circles in each panel.
Figure 7.
Figure 7.
A–F, Direction differences between rotation and translation, for MSTd neurons (A, C, E), reprinted with permission from Takahashi et al. (2007), their Figure 7, and the population of MSTd-like model units (B, D, F). A, B, Histograms of the absolute differences in 3D preferred direction (| Δ preferred direction |) between rotation and translation. C, D, Distributions of preferred direction differences as projected onto each of the three cardinal planes, corresponding to front view, side view, and top view. E, F, The ratio of the lengths of the 2D and 3D preferred direction vectors is plotted as a function of the corresponding 2D projection of the difference in preferred direction (red, green, and blue circles for each of the front view, side view, and top view data, respectively).
Figure 8.
Figure 8.
Population code underlying the encoding of perceptual variables such as heading (FOE) and eye velocity (pursuit, P). A, Distribution of FOE and pursuit selectivities in MSTd (dark gray), adapted with permission from Ben Hamed et al. (2003), their Figure 3, and in the population of MSTd-like model units (light gray). Neurons or model units were involved in encoding heading (FOE), eye velocity (P), both (EYE and P), or neither (none). B, Heading prediction (generalization) error as a function of the number of basis vectors using 10-fold cross-validation. Vertical bars are the SD. C, Population and lifetime sparseness as a function of the number of basis vectors. Operating the sparse decomposition model with B = 64 basis vectors co-optimizes for both accuracy and efficiency of the encoding, and leads to basis vectors that resemble MSTd receptive fields.
Figure 9.
Figure 9.
A–I, Heading perception during observer translation in the presence of eye movements. Shown are three different scene geometries (back plane, AC; ground plane, DF; dot cloud, GI), reprinted with permission from Royden et al. (1994), their Figures 6, 8, 9, 12, and 13. Observer translation was parallel to the ground and was in one of three directions (open circles), coinciding with the fixation point. Real and simulated eye movements were presented with rates of 0, ±1°/s, ±2.5°/s, or ±5°/s. B, E, H, Perceived heading reported by human subjects for real and simulated eye movements (open and closed symbols, respectively). C, F, I, Behavioral performance of model MSTd for simulated eye movements. Horizontal dotted lines indicate the actual headings of −4° (blue triangles), 0 (green squares), and +4° (red circles) relative to straight-ahead headings.
Figure 10.
Figure 10.
A–D, Heading discriminability based on population activity in macaque MSTd (A, C), reprinted with permission from Gu et al. (2010), their Figure 4, and in model MSTd (B, D). For the sake of clarity, we abandon our previously used coordinate system in favor of the one used by Gu et al. (2010), where lateral headings correspond to ±90°, and 0 corresponds to straight-ahead headings. A, B, Distribution of the direction of maximal discriminability, showing a bimodal distribution with peaks around the forward (0) and backward (±180°) directions. C, D, Scatter plot of the tuning-width at half-maximum vs the preferred direction of each neuron or model unit. The top histogram shows the marginal distribution of heading preferences. Also highlighted is a subpopulation of neurons or model units with direction preferences within 45° of straight-ahead headings and tuning width <115° (open symbols).
Figure 11.
Figure 11.
A, B, Population Fisher information of macaque MSTd (A), reprinted with permission from Gu et al. (2010), their Figure 5, and of model MSTd (B). Error bands in A represent 95% confidence intervals derived from a bootstrap procedure.
Figure 12.
Figure 12.
Gaussian tuning in spiral space. A, Gaussian tuning of a sample of 57 neurons across the full range of rotational flow fields, reprinted with permission from Graziano et al. (1994), their Figure 9. Each arrow indicates the peak response (the mean of the Gaussian fit) of each neuron in spiral space. B, The distribution of preferred spiral directions of a sample of 112 MSTd-like model units whose tuning curves were well fit with a Gaussian. Model units were more likely to be included in the sample the stronger they responded to an expansion stimulus. C, The distribution of preferred spiral directions applied to the entire population of 896 MSTd-like model units, of which 677 had smooth Gaussian fits. D, Bar plot of the data in A, for better comparison, adapted with permission from Clifford et al. (1999), their Figure 5. E, Bar plot of the data in B. F, Bar plot of the data in C.
Figure 13.
Figure 13.
Continuum of response selectivity. A, Response classification of a sample of 268 MSTd neurons, reprinted with permission from Duffy and Wurtz (1995), their Figure 3. B, Response classification of a sample of 188 MSTd-like model units. Model units were more likely to be included in the sample the stronger they responded to an expansion stimulus. C, Response classification of the entire population of 896 MSTd-like model units. Triple-component cells were as follows: PCR, NSE, and NSI. Double-component cells were as follows: PR, PC, and CR. Single-component cells were as follows: P, R, and C. Eighty-one percent of neurons in A, 88% of model units in B, and 77% of model units in C responded to more than one type of motion.

Similar articles

Cited by

References

    1. Attneave F. Some informational aspects of visual perception. Psychol Rev. 1954;61:183–193. doi: 10.1037/h0054663. - DOI - PubMed
    1. Barlow HB. Possible principles underlying the transformation of sensory messages. In: Rosenblinth WA, editor. Sensory communication. Cambridge, MA: MIT; 1961. pp. 217–234. Chapter 13.
    1. Beintema JA, van den Berg AV. Heading detection using motion templates and eye velocity gain fields. Vision Res. 1998;38:2155–2179. doi: 10.1016/S0042-6989(97)00428-8. - DOI - PubMed
    1. Beintema JA, van den Berg AV. Perceived heading during simulated torsional eye movements. Vision Res. 2000;40:549–566. doi: 10.1016/S0042-6989(99)00198-4. - DOI - PubMed
    1. Ben Hamed S, Page W, Duffy C, Pouget A. MSTd neuronal basis functions for the population encoding of heading direction. J Neurophysiol. 2003;90:549–558. doi: 10.1152/jn.00639.2002. - DOI - PubMed

Publication types

LinkOut - more resources