Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 8;37(45):11021-11036.
doi: 10.1523/JNEUROSCI.1222-17.2017. Epub 2017 Oct 6.

Hebbian Learning in a Random Network Captures Selectivity Properties of the Prefrontal Cortex

Affiliations

Hebbian Learning in a Random Network Captures Selectivity Properties of the Prefrontal Cortex

Grace W Lindsay et al. J Neurosci. .

Abstract

Complex cognitive behaviors, such as context-switching and rule-following, are thought to be supported by the prefrontal cortex (PFC). Neural activity in the PFC must thus be specialized to specific tasks while retaining flexibility. Nonlinear "mixed" selectivity is an important neurophysiological trait for enabling complex and context-dependent behaviors. Here we investigate (1) the extent to which the PFC exhibits computationally relevant properties, such as mixed selectivity, and (2) how such properties could arise via circuit mechanisms. We show that PFC cells recorded from male and female rhesus macaques during a complex task show a moderate level of specialization and structure that is not replicated by a model wherein cells receive random feedforward inputs. While random connectivity can be effective at generating mixed selectivity, the data show significantly more mixed selectivity than predicted by a model with otherwise matched parameters. A simple Hebbian learning rule applied to the random connectivity, however, increases mixed selectivity and enables the model to match the data more accurately. To explain how learning achieves this, we provide analysis along with a clear geometric interpretation of the impact of learning on selectivity. After learning, the model also matches the data on measures of noise, response density, clustering, and the distribution of selectivities. Of two styles of Hebbian learning tested, the simpler and more biologically plausible option better matches the data. These modeling results provide clues about how neural properties important for cognition can arise in a circuit and make clear experimental predictions regarding how various measures of selectivity would evolve during animal training.SIGNIFICANCE STATEMENT The prefrontal cortex is a brain region believed to support the ability of animals to engage in complex behavior. How neurons in this area respond to stimuli-and in particular, to combinations of stimuli ("mixed selectivity")-is a topic of interest. Even though models with random feedforward connectivity are capable of creating computationally relevant mixed selectivity, such a model does not match the levels of mixed selectivity seen in the data analyzed in this study. Adding simple Hebbian learning to the model increases mixed selectivity to the correct level and makes the model match the data on several other relevant measures. This study thus offers predictions on how mixed selectivity and other properties evolve with training.

Keywords: mixed selectivity; prefrontal cortex; random connectivity; theoretical models.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Description of PFC data and relevant measures of selectivity A, Task design. In both task types, the animal fixated as two image cues were shown in sequence. After a delay, the animal had to either indicate that a second presented sequence matched the first or not (“Recognition”) or saccade to the two images in correct order from a selection of three images (“Recall”). B, What nonlinear mixed selectivity can look like in neural responses and its impact on computation. The bar graphs on the left depict three different imagined neurons and their responses to combinations of two task variables A and B. The black neuron has selectivity only to A, as its responses are invariant to changes in B. The blue neuron has linear mixed selectivity to A and B: its responses to different values of A are affected by the value of B, but in a purely additive way. The red neuron has nonlinear mixed selectivity: its responses to A are affected nonlinearly by a change in the value of B. The figures on the right show how including a cell with nonlinear mixed selectivity in a population increases the dimensionality of the representation. With the nonlinearly selective cell (bottom), the black dot can be separated with a line from the green dots. Without it (top), it cannot. C, A depiction of measures of trial-to-trial noise (FFT) and the distribution of responses across conditions (RV). The x-axis labels the condition. Each dot is the firing rate for an individual trial. The crosses are condition means used for calculating RV (data from a real neuron; recognition task not shown). D, Conceptual depiction of the clustering measure. Each cell was represented as a vector (blue) in a space wherein the axes (black) represent preference for task-variable identities, as determined by the coefficients from a GLM (only 3 are shown here). The clustering measure determines whether these vectors are uniformly distributed.
Figure 2.
Figure 2.
Signal and noise representation for the toy model shown in Figure 8A. Strength of weights from the four input populations are given as arrows in A and B and the threshold for the heaviside function is shown as a dotted line. The cell is active for conditions above the threshold (green). Weight arrows omitted for visibility in C and D. A, Learning causes the representation of conditions to change. This can change selectivity in multiple ways. Here, pure selectivity turns into mixed selectivity (top) and mixed selectivity turns into pure selectivity (bottom). B, Constrained and free learning can lead to different signal changes. Constrained learning (top) guarantees that one population from each task variable is increased. This ensures that the representation spreads out. In this case, the cell goes from no selectivity to mixed selectivity. With these starting weights, free learning increases both populations from T2, and the cell does not gain selectivity. C, Noise robustness can be thought of as the range of thresholds that can sustain a particular type of selectivity. Relative noise robustness of mixed and pure selectivity depends on the shape of the representation. α is the ratio of the differences between the weights from each task variable (top). In the two figures on the bottom, blue dotted lines show optimal threshold for pure selectivity, red dotted lines show optimal threshold for mixed selectivity, and shaded areas show the range of thresholds created by trial-wise additive noise that can exist without altering the selectivity. When α < 2, mixed selectivity is robust to larger noise ranges (bottom left). When α > 2, pure selectivity is more robust (bottom right). Given normally distributed weights, α > 2 is more common. D, Two example cells showing how selectivity changes with changing λ. Sets of weights for both cells are drawn from the same distribution. The resulting thresholds at three different λ values (labeled on the right cell but identical for each) are shown for each cell. With the smallest λ, neither example cell has selectivity. With the middle λ value Cell 1 gains mixed. Cell 2 gains pure selectivity, which it retains at the higher λ, while Cell 1 switches to the other type of mixed.
Figure 3.
Figure 3.
Results from the experimental data. A, Selectivity profile of the 90 cells analyzed. A cell had pure selectivity to a given task variable if the term in the ANOVA associated with that task variable was significant (p < 0.05). A cell had nonlinear mixed selectivity to a combination of task variables if the interaction term for that combination was significant. On the right are the percentages of cells that had ≥1 type of pure selectivity (blue) and percentage of cells that had ≥1 type of mixed selectivity (red). B, Values of firing rate, FFT, and RV for these data. Each open circle is a neuron and the red markers are the population means. C, β Coefficients from GLM fits for each cell. The condition wherein TT = Recognition, C1 = A, and C2 = B was used as the reference condition. These values were used to determine the clustering value. D, Clustering values for data and comparison populations. The red dot shows the clustering value calculated using the GLM coefficients from the data. The shuffled data come from shuffling the GLM coefficients across cells. The clustered data are derived from populations of fake cells designed to have three different categories of cell types defined according to selectivity.
Figure 4.
Figure 4.
The full model and how learning occurs in it. A, The model consists of groups of binary input neurons (colored blocks) that each represent a task-variable identity. The number of neurons per group is given in parenthesis. Each PFC cell (gray circles) receives random input from the binary cells. Connection probability is 25% and weights are Gaussian-distributed and non-negative. The sum of inputs from the binary population and an additive noise term are combined as input to a sigmoidal function (bottom). The output of the PFC cell on a given trial is a function of the output of the sigmoidal function, r, and a multiplicative noise term (see Materials and Methods). The threshold, ϴ, is given as percentage of the sum total of the weights into to each cell. B, Two styles of learning in the network, both of which are based on the idea that the input groups that initially give strong input to a PFC cell have their weights increased with learning (sum of weights from each population are given next to each block). In “free” learning, the top NL input populations are chosen freely. In this example, that means two groups from the C1 task variable have their weights increased (marked in blue). In “constrained” learning, the top NL populations are chosen with the constraint that they cannot come from the same task variable. In this case, that means that Cue 2D is chosen over Cue 1C despite the latter having a larger summed weight. In both cases, all weights are then normalized. C, Learning curves as a function of learning steps for different values of NL. Strength of changes in the weight matrix expressed as a percentage of the sum total of the weight matrix are plotted for each learning step (a learning step consists of both the weight increase and normalization steps). Different colors represent different NL's.
Figure 5.
Figure 5.
Results from the model without learning. A, FFT and other measures can be controlled by the additive and multiplicative noise parameters. Each circle's color shows the value for the given measure averaged over 25 networks for a set of a and m values (see Materials and Methods). FFT scales predictably with both noise parameters. Fraction of cells with mixed selectivity, fraction of cells with pure selectivity, and clustering scale inversely with the noise parameters. Other model parameters are taken from the arrow locations in B and C. B, How the threshold parameter, λ, affects measures of selectivity. Lines show how the average value of the given measure in the model (in units of SDs calculated over 100 random instantiations of the model) differs from the data as a function of the threshold parameter λ, where ϴi = λΣjwij. At each point, noise parameters are fit to keep FFT close to the data value. Note that SD values for mixed selectivity and clustering remain steady across threshold values at ∼4% and 20.7 respectively. RV SD, however, increases from 0.0087 to 4.3 spikes/s and pure selectivity SD trends toward zero as all cells gain pure selectivity. C, Same as B, but varying the width of the weight distribution rather than the threshold parameter. Here, RV SD increases only slightly, from 0.02 to 0.048 spikes/s, pure selectivity SD decreases slightly from 4.0 to 2.5% and mixed selectivity and clustering SDs remain fairly constant around 4.9% and 31.2 respectively. D, Example of the model results at the points given by the black arrows in B and C. On the left, blue and red bars are the data values as in Figure 2. The lines are model values (averaged over 100 networks; error bars, ±1 SD). On the right, histograms of model values over 100 networks. The red markers are data values. This model has no learning.
Figure 6.
Figure 6.
The model with learning. A, How selectivity measures change with learning. In each plot, color represents NL value, solid lines are free learning, and dotted lines are constrained learning (only 1 line is shown for NL = 1 as the free and constrained learning collapse to the same model in this circumstance). Step 0 is the random network. Black dotted lines are data values and error bars are ±1 SD over 100 networks. In the pure selectivity plot, with constrained learning and when NL = 1, the value maxes out at 100% in essentially all networks, leading to vanishing error bars. B, All measures as a function of learning for the NL = 3 free learning case. Values are given in units of model SD away from the data value as in Figure 5B,C. C, The model results at the learning step indicated with the black arrow in B. On the left, blue and red bars are the data values as in Figure 3. The lines are model values (averaged over 100 networks; error bars, ±1 SD). On the right, histograms of model values over 100 networks. The red markers are data values. Here, the model provides a much better match to the data. D, Decoding performance increases with learning. Average performance of classifiers trained to read out linear terms (top left) and higher-order terms (bottom left) from PFC population activity increases after learning compared with the random network (learned model indicted by arrow in B). Error bars, ± 1 SEM, over 10 random instantiations of the network. Read out of same versus different cue identities is better when using the PFC population after learning (right).
Figure 7.
Figure 7.
How noise robustness varies with threshold in a random network using the toy model. A, Schematic of the toy model: four input populations (2 from each task variable) send weighted inputs to a cell with a threshold (ϴ) nonlinearity. B, For a given noise value, the fraction of cells that would lose selectivity if that noise value were used. Values are separated for cells with pure (blue) and mixed (red) selectivity. Three λ values shown, where ϴ = λΣ W. C, Based on plots like those in B, the noise value at which 50% of cells have lost selectivity is calculated (“Noise Robustness” refers to these values normalized by the peak value; higher values are better) and plotted as a function of λ (solid lines). On the same plot, the percentage of cells with each type of selectivity in the absence of noise is shown (dotted lines). The black dotted line marks a λ value at which the probability of mixed and pure selective cells is equal, but their noise robustness is unequal. This plot is mirror-symmetric around λ = 0.5.
Figure 8.
Figure 8.
How learning affects noise robustness. A, A simple toy cell (left) with two task variables is used to show the effects of learning. The four possible conditions are plotted as dots (green if above threshold, black if not), with the threshold as a dotted black line. Colored arrows represent the weights from each population. Before learning (middle), the cell's input on two of the conditions falls within the range of the shifting threshold created by additive noise (gray area). After learning, all conditions are outside the noise range. B, A third task variable is added to the model and is another source of additive noise from the perspective of T1–T2 selectivity. The model's outputs are color-coded according to which T3 population is active. Weight arrows are omitted for visibility. After learning with NL = 2, input strength from T3 populations are decreased and the points from the same T1–T2 condition are closer together (less noisy). C, How the percentage of cells with a given selectivity (left) and their noise robustness (right) change with constrained learning as a function of the threshold parameter λ. Learning steps are symbolized by increasing color brightness (the darkest line is the random model as displayed in Fig. 7C, and the dashed line shows where the percentage of mixed and pure are the same in the random model).

Similar articles

Cited by

References

    1. Babadi B, Sompolinsky H (2014) Sparseness and expansion in sensory representations. Neuron 83:1213–1226. 10.1016/j.neuron.2014.07.035 - DOI - PubMed
    1. Barak O, Rigotti M, Fusi S (2013) The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. J Neurosci 33:3844–3856. 10.1523/JNEUROSCI.2753-12.2013 - DOI - PMC - PubMed
    1. Barbour B, Brunel N, Hakim V, Nadal JP (2007) What can we learn from synaptic weight distributions? Trends Neurosci 30:622–629. 10.1016/j.tins.2007.09.005 - DOI - PubMed
    1. Botvinick MM. (2008) Hierarchical models of behavior and prefrontal function. Trends Cogn Sci 12:201–208. 10.1016/j.tics.2008.02.009 - DOI - PMC - PubMed
    1. Bourne JN, Harris KM (2011) Coordination of size and number of excitatory and inhibitory synapses results in a balanced structural plasticity along mature hippocampal CA1 dendrites during LTP. Hippocampus 21:354–373. 10.1002/hipo.20768 - DOI - PMC - PubMed

LinkOut - more resources