. 2019 Aug 1;14(8):e0220161.
Probabilistic Associative Learning Suffices for Learning the Temporal Structure of Multiple Sequences
Free PMC article
Item in Clipboard
Probabilistic Associative Learning Suffices for Learning the Temporal Structure of Multiple Sequences
Free PMC article
From memorizing a musical tune to navigating a well known route, many of our underlying behaviors have a strong temporal component. While the mechanisms behind the sequential nature of the underlying brain activity are likely multifarious and multi-scale, in this work we attempt to characterize to what degree some of this properties can be explained as a consequence of simple associative learning. To this end, we employ a parsimonious firing-rate attractor network equipped with the Hebbian-like Bayesian Confidence Propagating Neural Network (BCPNN) learning rule relying on synaptic traces with asymmetric temporal characteristics. The proposed network model is able to encode and reproduce temporal aspects of the input, and offers internal control of the recall dynamics by gain modulation. We provide an analytical characterisation of the relationship between the structure of the weight matrix, the dynamical network parameters and the temporal aspects of sequence recall. We also present a computational study of the performance of the system under the effects of noise for an extensive region of the parameter space. Finally, we show how the inclusion of modularity in our network structure facilitates the learning and recall of multiple overlapping sequences even in a noisy regime.
Conflict of interest statement
The authors have declared that no competing interests exist.
Fig 1. Network architecture and connectivity underlying sequential pattern activation.
Network architecture and connectivity underlying sequential pattern activation. (A) network topology. Units
are organized into hypercolumns u i j h 1, …, h . At each point in time only one unit per hypercolumn is active due to a WTA mechanism. Each memory pattern is formed by a set of H H recurrently connected units distributed across hypercolumns. For simplicity and without compromising the generality we adopt the following notation for patterns . We depict stereotypical network connectivity by showing all the units that emanate from unit P 1 = ( u 1 1 , … , u 1 H ) . The unit has excitatory projections to the proximate units in the sequence (connections from u 1 1 to u 1 1 and u 2 1 and the corresponding units in other hypercolumns) and inhibitory projections to both the units that are farther ahead in the sequence ( u 3 1 to u 1 1 ) and the units that are not in the sequence at all (gray units). (B) abstract representation of the relevant connectivity for sequence dynamics. Please note that only connections from u 4 1 P 2 are shown.
Fig 2. An instance of sequence recall in the model.
(A) Sequential activity of units initiated by the cue. (B) The time course of the adaptation current for each unit. (C) The total current
s, note that this quantity crossing the value of w next o (depicted here with a dotted line) marks the transition point from one pattern to the next. (D) The connectivity matrix where we have included pointers to the most important quantities w for the self-excitatory weight, self w for the inhibitory connection to the next element, next w for the largest connection in the column after rest w and next w for the connection to the last pattern that was active in the sequence. prev
Fig 3. Systematic study of persistence time
T . per
T dependence of B. The blue solid line represents the theoretical prediction described in Eq 4 and the orange bullets are the result of simulations. Inset depicts what happens close to per B = 0 where we can see that the lower limit is the time constant of the units τ . (B) An example of sequence recall where s T = 100 per ms. This example corresponds to configuration marked the black star in (A). (C) example of sequence recall with T = 500 per ms. This example corresponds to the configuration marked with a black triangle in (A). (D) Recall of a sequence with variable temporal structure (varying T . The values of per T are 500, 200, 1200, 100, and 400 ms respectively. per
Fig 4. Sequence learning paradigm.
(A) Relationship between the connectivity matrix
w and the z-traces. The weight w from unit ij i to unit j is determined by the probability of co-activation of those units which in turn is proportional to the overlap between the z-traces (show in dark red). The symmetric connection w is calculated through the same process but with the traces flipped (here shown in dark blue). Note that the asymmetry of the weights is a direct consequence of the asymmetry of the z-traces. (B) Schematic of the training protocol. In the top we show how the activation of the patterns (in gray) induces the z-traces. In the bottom we show the structure of the training protocol where the pulse time ij T and the inter-pulse interval (IPI) are shown for further reference. (C) We trained a network with only five units in a single hypercolumn for illustration. The first three epochs (50 in total) of the training protocol are shown for reference. The values of the parameters during training were set to p T = 100 p ms, IPI = 0 ms, and τ z p r e = 50 m s . (D) The matrix at the end of the training (after 50 epochs). (E) Evolution of the probability values during the first three epochs of training. The probability values of the pre ( τ z p o s t = 5 m s p ), post ( i p ) and joint probability ( j p ) evolve with every presentation. Note that the same color code is used in images C, E and F. (F) Long-term evolution of the probabilities with respect to the number of epochs. The values of the probability traces eventually reach a steady state. (G) Short-term evolution of the weight matrix at the points marked in the first epoch in C. Note that the colors are subjected to the same colorbar reference as in D. ij
Fig 5. Characterization of the effect of training in the connectivity weights and persistent times.
The equation on the inset in D relates
T to Δ per w = next w − self w which we show as dashed red lines in each of the top figures (note that here Δ next β = 0 as we trained with an homogeneous protocol). When the parameters themselves are not subjected to variation their values are: T = 100 p ms, IPI = 0 ms, , τ z p r e = 25 m s for all the units. (A-C) Show how the weights depend on the training parameters τ z p o s t = 20 m s T , inter pulse interval and p , respectively, whereas (D-E) illustrate the same effects on τ z p r e T . Here we are providing the steady state values of per w obtained after 100 epochs of training.
Fig 6. Transition from the sequence regime to a random reactivation regime.
(A) An example of a sequential (ordered) activation of patterns. (B) Unordered reactivation of the learned attractors. (C) The two regimes (sequential in blue and random reactivation of attractors in red) in the relevant parameter space spammed by
and inter pulse interval. The examples in (A) and (B) correspond to the black dot and the star, respectively. τ z p r e
Fig 7. Effects of noise reflected in current trajectories and persistence times.
(A) An example of current trajectories subjected to noise. The solid lines indicate the deterministic trajectories the system would follow in the zero noise case. In dotted, jagged and dashed lines we depict the currents induce
w , self w and next w for reference. (B) Change in the average of the actual value of rest T for different levels on noise. We Shaded the area between the 25th and the 75th percentile to convey and idea of the distribution for every value of per σ (C) Success rate vs noise profile dependence on T . We ran 1000 simulations of recall and present the ratio of successful recalls as a function of per σ. Confidence intervals from the binomial distribution are too small to be seen.
Fig 8. Sensitivity of network performance to noise for different parameters.
The base reference values of the parameters of interest are:
T = 100 p ms, IPI = 0 ms, , τ z p r e = 25 m s , sequence length = 5, #hypercolumns = 1. (A) Two examples of the success vs noise profiles ( τ z p o s t = 15 m s T = 50 p ms, 200 ms). The value of σ 50 is indicated in the abscissa for clarity, note that smaller σ 50 implies a network that is more sensitive to noise (the success rate decays faster). (B) σ 50 variation with respect to T . We also indicate the P σ 50 for the values of T used in (A) with stars of corresponding colors.(C) p σ 50 variation with respect to the inter pulse intervals. (D) σ 50 variation with respect to the value of . (E) τ z p r e σ 50 variation with respect to sequence length. (F) σ 50 variation with respect to the number of hypercolumns.
Fig 9. Overlapping representations and sequences.
(A1) Schematic of the parameterization framework. Black and gray stand for the representational overlap and the sequential overlap respectively (see text for details) (A2) Schematic of the sequence disambiguation problem. (B) An example of two sequences with overlap. Here each row is a hypercolumn and each column a pattern (patterns
P 1, x P 2, x P 3, x P 4, x P 5, and x P 6). The single entries represent the particular unit that was activated for that hypercolumn and pattern. (C) The superposition of the recall phase for the sequences in (B). Each sequence recall is highlighted by its corresponding color. We can appreciate inside the gray area that the second and third hypercolumns (sequential overlap of 2) have the same units activated (depicted in black). This reflects the fact those patterns have a representational overlap of x (two out of three hypercolumns). 2 3
Fig 10. Sequence recall performance for different overlap conditions.
The base line values of the parameters of interest are
T = 100 p ms, Δ T = 0 p ms, , τ z p r e = 25 m s , sequence length = 10, τ z p o s t = 5 m s H = 10 and T = 50 per ms. (A) Success rate for pairs of two sequences with different sequential and representation overlaps. We show here the performance over the parameter space. Success here is determined by correct recall of both sequences. Note that the white corner in the top-right is undefined as it corresponds to a degree of sequential overlap that would include either the first or the last pattern in the sequence (B) Success rate vs noise level for the sequences with configurations marked as 1, 2, 3, 4 in A. The values of σ 50 are marked for illustration purposes. (C) σ 50 as a function of the sequential overlap. The values of σ 50 are calculated over the sequences with configurations given in the green horizontal line in A. (D) σ 50 as a function of the representation overlap. The values of σ 50 are calculated over the sequences with configurations given in the blue vertical line in A. (E) max disambiguation as a function of T . The network loses disambiguation power with long lasting attractors as the memory of the earlier pattern activation reflected in the currents fades. (F) Success rate vs noise profile in the disambiguation regime. The three curves correspond to overlapping sequence configurations marked with x, y, and z in A. Shaded areas correspond to 95% confidence intervals (1000 trials). per
Fig 11. The BCPNN weights temporal co-activations against overall activations.
The significance of temporal associations. (A) Here we compare naive simple Hebbian learning with the BCPNN in terms of relative weighting of different temporal associations. In the presented example there are three associations
E → F, E → G, and H → G that have been observed 99, 1, 1 occasions respectively. Simple Hebbian learning weights just the frequency of the associations and, as a consequence, E → G and H → G end up with the same association weight. The BCPNN, on the other hand, differentiates the weights as it takes into account the total activation probability of each unit, rendering the temporal association H → G more significant than E → G.
All figures (11)
Spike-Based Bayesian-Hebbian Learning of Temporal Sequences.
PLoS Comput Biol. 2016 May 23;12(5):e1004954. doi: 10.1371/journal.pcbi.1004954. eCollection 2016 May.
PLoS Comput Biol. 2016.
27213810 Free PMC article.
Network capacity analysis for latent attractor computation.
Network. 2003 May;14(2):273-302.
Neural associative memory with optimal Bayesian learning.
Neural Comput. 2011 Jun;23(6):1393-451. doi: 10.1162/NECO_a_00127. Epub 2011 Mar 11.
Neural Comput. 2011.
Modelling studies on the computational function of fast temporal structure in cortical circuit activity.
J Physiol Paris. 2000 Sep-Dec;94(5-6):473-88. doi: 10.1016/s0928-4257(00)01098-6.
J Physiol Paris. 2000.
Luczak A, Barthó P, Marguet SL, Buzsáki G, Harris KD. Sequential structure of neocortical spontaneous activity in vivo. Proceedings of the National Academy of Sciences. 2007;104(1):347–352. 10.1073/pnas.0605643104
Jin DZ, Fujii N, Graybiel AM. Neural representation of time in cortico-basal ganglia circuits. Proceedings of the National Academy of Sciences. 2009; p. pnas–0909881106. 10.1073/pnas.0909881106
Harvey CD, Coen P, Tank DW. Choice-specific sequences in parietal cortex during a virtual-navigation decision task. Nature. 2012;484(7392):62 10.1038/nature10918
Tang A, Jackson D, Hobbs J, Chen W, Smith JL, Patel H, et al. A maximum entropy model applied to spatial and temporal correlations from cortical networks in vitro. Journal of Neuroscience. 2008;28(2):505–518. 10.1523/JNEUROSCI.3359-07.2008
Barnes TD, Kubota Y, Hu D, Jin DZ, Graybiel AM. Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories. Nature. 2005;437(7062):1158 10.1038/nature04053
Research Support, Non-U.S. Gov't
Association Learning / physiology*
Neural Networks, Computer*
This work was supported by grants from the Swedish Science Council (Vetenskapsrådet, VR2018-05360), Swedish e-Science Research Center (SeRC) and the EuroSPIN Erasmus Mundus doctoral programme. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.