Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 11, 81
eCollection

A Theory of How Columns in the Neocortex Enable Learning the Structure of the World

Affiliations

A Theory of How Columns in the Neocortex Enable Learning the Structure of the World

Jeff Hawkins et al. Front Neural Circuits.

Abstract

Neocortical regions are organized into columns and layers. Connections between layers run mostly perpendicular to the surface suggesting a columnar functional organization. Some layers have long-range excitatory lateral connections suggesting interactions between columns. Similar patterns of connectivity exist in all regions but their exact role remain a mystery. In this paper, we propose a network model composed of columns and layers that performs robust object learning and recognition. Each column integrates its changing input over time to learn complete predictive models of observed objects. Excitatory lateral connections across columns allow the network to more rapidly infer objects based on the partial knowledge of adjacent columns. Because columns integrate input over time and space, the network learns models of complex objects that extend well beyond the receptive field of individual cells. Our network model introduces a new feature to cortical columns. We propose that a representation of location relative to the object being sensed is calculated within the sub-granular layers of each column. The location signal is provided as an input to the network, where it is combined with sensory data. Our model contains two layers and one or more columns. Simulations show that using Hebbian-like learning rules small single-column networks can learn to recognize hundreds of objects, with each object containing tens of features. Multi-column networks recognize objects with significantly fewer movements of the sensory receptors. Given the ubiquity of columnar and laminar connectivity patterns throughout the neocortex, we propose that columns and regions have more powerful recognition and modeling capabilities than previously assumed.

Keywords: cortical columns; cortical layers; hierarchical temporal memory; neocortex; sensorimotor learning.

Figures

Figure 1
Figure 1
(A) Our network model contains one or more laterally connected cortical columns (three shown). Each column receives feedforward sensory input from a different sensor array [e.g., different fingers or adjacent areas of the retina (not shown)]. The input layer combines sensory input with a modulatory location input to form sparse representations that correspond to features at specific locations on the object. The output layer receives feedforward inputs from the input layer and converges to a stable pattern representing the object (e.g., a coffee cup). Convergence in the second layer is achieved via two means. One is by integration over time as the sensor moves relative to the object, and the other is via modulatory lateral connections between columns that are simultaneously sensing different locations on the same object (blue arrows in upper layer). Feedback from the output layer to the input layer allows the input layer to predict what feature will be present after the next movement of the sensor. (B) Pyramidal neurons have three synaptic integration zones, proximal (green), basal (blue), and apical (purple). Although individual synaptic inputs onto basal and apical dendrites have a small impact on the soma, co-activation of a small number of synapses on a dendritic segment can trigger a dendritic spike (top right). The HTM neuron model incorporates active dendrites and multiple synaptic integration zones (bottom). Patterns recognized on proximal dendrites generate action potentials. Patterns recognized on the basal and apical dendrites depolarize the soma without generating an action potential. Depolarization is a predictive state of the neuron. Our network model relies on these properties and our simulations use HTM neurons. A detailed walkthrough of the algorithm can be found in the Supplementary Video.
Figure 2
Figure 2
Cellular activations in the input and output layers of a single column during a sequence of touches on an object. (A) Two objects (cube and wedge). For each object, three feature-location pairs are shown (f1 and f3 are common to both the cube and wedge). The output layer representations associated with each object, and the sensory representations for each feature are shown. (B) Cellular activations in both layers caused by a sequence of three touches on the cube (in time order from top to bottom). The first touch (at f1) results in a set of active cells in the input layer (black dots in input layer) corresponding to that feature-location pair. Cells in the output layer receive this representation through their feed-forward connections (black arrow). Since the input is ambiguous, the output layer forms a representation that is the union of both the cube and the wedge (black dots in output layer). Feedback from the output layer to the input layer (red arrow) causes all feature-location pairs associated with both potential objects to become predicted (red dots). The second touch (at f2) results in a new set of active cells in the input layer. Since f2 is not shared with the wedge, the representation in the output layer is reduced to only the cube. The set of predicted cells in the input layer is also reduced to the feature-location pairs of the cube. The third touch (at f3) would be ambiguous on its own, however, due to the past sequence of touches and self-reinforcement in the output layer, the representation of the object in the output layer remains unique to the cube. Note the number of cells shown is unrealistically small for illustration clarity.
Figure 3
Figure 3
(A) The output layer represents each object by a sparse pattern. We tested the network on the first object. (B) Activity in the output layer of a single column network as it touches the object. The network converges after 11 sensations (red rectangle). (C) Activity in the output layer of a three column network as it touches the object. The network converges much faster, after four sensations (red rectangle). In both (B,C) the representation in Column 1 is the same as the target object representation after convergence.
Figure 4
Figure 4
(A) Mean number of sensations needed to unambiguously recognize an object with a single column network as the set of learned objects increases. We train models on varying numbers of objects, from 1 to 100 and plot the average number of sensations required to unambiguously recognize a single object. The different curves show how convergence varies with the total number of unique features from which objects are constructed. In all cases the network eventually recognizes the object. Recognition requires fewer sensations when the set of features is greater. (B) Mean number of observations needed to unambiguously recognize an object with multi-column networks as the set of columns increases. We train each network with 100 objects and plot the average number of sensations required to unambiguously recognize an object. The required number of sensations rapidly decreases as the number of columns increases, eventually reaching one. (C) Fraction of objects that can be unambiguously recognized as a function of number of sensations for an ideal observer model with location (blue), without location (orange) and our one-column sensorimotor network (green).
Figure 5
Figure 5
Recognition accuracy is plotted as a function of the number of learned objects. (A) Network capacity relative to number of mini-columns in the input layer. The number of output cells is kept at 4,096 with 40 cells active at any time. (B) Network capacity relative to number of cells in the output layer. The number of active output cells is kept at 40. The number of mini-columns in the input layer is 150. (C) Network capacity for one, two, and three cortical columns (CCs). The number of mini-columns in the input layer is 150, and the number of output cells is 4,096.
Figure 6
Figure 6
Robustness of a single column network to noise. (A) Recognition accuracy is plotted as a function of the amount of noise in the sensory input (blue) and in the location input (yellow). (B) Recognition accuracy as a function of the number of sensations. Colored lines correspond to noise levels in the location input.
Figure 7
Figure 7
Mapping of sensorimotor inference network onto experimentally observed cortical connections. Arrows represent documented pathways. (A) First instance of network; L4 is input layer, L2/3 is output layer. Green arrows are feedforward pathway, from thalamo-cortical (TC) relay cells, to L4, to L2/3 cortico-cortical (CC) output cells. Cells in L2/3 also project back to L4 and to adjacent columns (blue arrows); these projections depolarize specific sets of cells that act as predictions (see text). Red arrow is location signal originating in L6a and terminating on basal distal dendrites of L4 cells. (B) Possible second instance of network; L6a is input layer, L5 is output layer. Both instances of the network receive feedforward input from the same TC axons, thus the two networks run in parallel (Constantinople and Bruno, ; Markov et al., 2013). The origin and derivation of the location signal (LOC) is unknown but likely involves local processing as well as input from other regions (see text and Discussion). The output of the upper network makes direct cortical-cortical (CC) connections, whereas the output of the lower network projects to thalamic relay cells before projecting to the next region.

Similar articles

See all similar articles

Cited by 5 PubMed Central articles

References

    1. Ahmad S., Hawkins J. (2016). How do neurons operate on sparse distributed representations? A mathematical theory of sparsity, neurons and active dendrites. arXiv:1601.00720 [q-NC].
    1. Ahveninen J., Jääskeläinen I. P., Raij T., Bonmassar G., Devore S., Hämäläinen M., et al. (2006). Task-modulated “what” and “where” pathways in human auditory cortex. Proc. Natl. Acad. Sci. U.S.A. 103, 14608–14613. 10.1073/pnas.0510480103 - DOI - PMC - PubMed
    1. Andersen R. A., Snyder L. H., Li C. S., Stricanne B. (1993). Coordinate transformations in the representation of spatial information. Curr. Opin. Neurobiol. 3, 171–176. 10.1016/0959-4388(93)90206-E - DOI - PubMed
    1. Bastos A. M., Usrey W. M., Adams R. A., Mangun G. R., Fries P., Friston K. J. (2012). Canonical microcircuits for predictive coding. Neuron 76, 695–711. 10.1016/j.neuron.2012.10.038 - DOI - PMC - PubMed
    1. Billaudelle S., Ahmad S. (2015). Porting htm models to the heidelberg neuromorphic computing platform. arXiv:1505.02142 [q-NC].

Publication types

LinkOut - more resources

Feedback