Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 19;12(10):e1005137.
doi: 10.1371/journal.pcbi.1005137. eCollection 2016 Oct.

Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP

Free PMC article

Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP

Yoonsik Shim et al. PLoS Comput Biol. .
Free PMC article


We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.

Conflict of interest statement

The authors have declared that no competing interests exist.


Fig 1
Fig 1. The standard MoE architecture.
The outputs (classifications) from the classifier networks are fed into an output unit which combines them according to some simple rule. The gating network weights the individual classifier outputs before they enter the final output unit, and thus guides learning of the overall combined classification. The classifiers and gating networks receive the same input data. See text for further details.
Fig 2
Fig 2. Experimentally observed ITDP behaviour (left) (after [26]), and its simplifications (right) used in this paper.
The original ITDP behaviour is modelled either by a Gaussian (for spiking neural network) or a pulse (for logical voter network) functions.
Fig 3
Fig 3. A voter and the voter ensemble network (NC = 4).
(Left) A voter and the predefined firing probabilities of each voter neuron for a set of virtual input samples X = {x1, x2, …, xM}. (Right) The voter ensemble network. The weight wkij represents the weight of connection from the ith neuron of the jth voter to the kth neuron of the final voter.
Fig 4
Fig 4. SEM-ITDP ensemble network architecture.
The STDP connections, which projects from the selected input neurons to each WTA circuit, together with the WTA circuits constitute the SEM ensemble. The ITDP connections have the same connectivity as the logical ITDP model. All of the ensemble, gating and final output networks use the same SEM circuit model.
Fig 5
Fig 5. Spike trains from the SEM ensemble network with NE = 5 and random feature selection.
(Left) Plot shows the input neuron spikes from eight image presentations from different classes (digits) which are depicted in different colors (black: 0, red: 1, green: 2, blue: 4). (Right) Two graphs show the output spikes of ensemble, gating, and final WTA neurons before and after learning. The colors of the spikes represent which class is being presented as input. After learning the network outputs produce consistent firing patterns, each output spiking exclusively for a single class.
Fig 6
Fig 6. An example of the STDP weight maps of a SEM classifier after learning (A, B) and the time evolution of ITDP weights (C).
Each weight map represents the presynaptic weight values that project to each of four WTA neurons (which each fire dominantly for one of the classes). The grey area shows pixels disabled by preprocessing, and each colored pixel represent the difference of the weights from the two input neurons for the corresponding pixel (white pixels represent unselected features). So as to use all features, a quarter of pixels are evenly selected from the supersampled image in order to use all pixels of the original data.
Fig 7
Fig 7. Examples of ensemble behaviours (NE = 9) for different gating network performances ((A) better than, (B) similar to, (C) worse than the ensemble average).
All the ensemble and the gating WTAs used random feature selection. The colors represent the NCEs of the final network (red), the gating network (blue), the ensemble networks (grey) and their average (black). Vertical lines indicate the time span of the total data presentation, where input data are sequentially presented for multiple rounds in order to see long term convergence. The NCE value at time t is calculated by counting the class-dependent spikes within the past finite time window of [Tp, t] (Tp < t). In order to prevent a sudden change in the NCE plots due to the exclusion of the early system output (which are immature resulting in high NCE values) from the time window, Tp was dynamically changed for faster burn-out of those initial values as: Tp = t(1−d/4D) where d = t when t < 2D and d = 2D otherwise, D = 224sec is the duration of one round of dataset presentation. See Methods for details of the NCE calculations.
Fig 8
Fig 8. Illustrative images for controlled feature assignment for SEM ensemble networks.
White regions indicate available pixels (active region) as defined by preprocessing, and the Gaussian means for the normal Gaussian selection scheme are evenly placed inside such regions by random placement procedure (See Methods for details of the actual Gaussian mean placement). The number of stretched Gaussian features used increases linearly with ensemble size (see Methods for details). The diameters of red circles and ovals roughly represent the full width at a tenth of maximum (FWTM) for each principal direction (the length of an oval is shown far shorter than it actual is for the sake of visualization—long ovals are used to ensure they form roughly uniform bars in the region of available pixels). In all cases, exactly 1/4 of pixels from the available (white) region are stochastically selected (without replacement) for each ensemble network according to each distribution function.
Fig 9
Fig 9. Examples of STDP weight maps from different feature selection schemes when NE = 5.
The weight maps for the ensemble WTA neurons which represent the digit 1 after learning are shown.
Fig 10
Fig 10. All WTA performances vs. ensemble sizes for different feature selection schemes.
Results having similar gating network performances are depicted by manually finding the ‘best’ gating network performances at around NCE≈0.26. All NCE values were taken at the end of simulations which were run for two rounds of input presentations (t = 448 sec). Colors represent: ensemble networks (grey), gating network (blue), and the final output network (red).
Fig 11
Fig 11. Statistics of ensemble performances and diversities for different feature selection schemes and ensemble sizes.
Each point in the graphs (A-D) is the averaged value of 50 simulations, and the error bars represent standard deviations. Eesb in (C, D, E) represents the average NCE of ensemble members at each simulation, Div (B, D, E) is diversity. (E) Final network NCE vs. the difference of diversity and average ensemble NCE. The background dots (grey, orange, light blue) represent every individual simulation from all three feature selection schemes (random, normal Gaussian, stretched Gaussian respectively) and eight ensemble sizes (3×8×50 = 1200 runs), and the larger dots are the average values of each of 50 repeated simulations (same colors as A-D).
Fig 12
Fig 12
(A, B, C) Training and test performances demonstrating generalization to unseen data (NE = 5). The testing phase starts at iteration 448 by freezing the weights and by replacing the input samples by the test set which was not shown to the system during the learning phase. (D) Test set error rates of the final output unit, (E) the average ensemble error rates, and (F) the training phase diversities (same as in Fig 11) over different ensemble sizes using the normal Gaussian selection scheme on the integer recognition problem. Each data point was plotted by averaging 50 runs, where the error bar shows the standard deviations. NCE calculations as in Fig 7.
Fig 13
Fig 13. Training performances of the expanded STDP/ITDP networks (using random feature selection on the MNIST handwritten digits classification task as in earlier experiments).
Each color represents, red: ITDP final WTA, green: STDP final WTA, blue: gating WTA, grey/black: ensemble WTAs and their average. (A, B) An example of time courses of performances and the final performances from 50 repeated trials using unsupervised gating WTA. The individual trials were sorted by gating WTA performances in ascending order. (C, D) Simulations using the automatic selection of gating WTA. The vertical lines with arrowheads in C indicate where the switching of gating WTA occurs (see text for further details).
Fig 14
Fig 14. Average performances of STDP and ITDP ensembles over 50 trials on the MNIST handwritten digits task using selected/supervised gating WTAs for different feature selection schemes and ensemble sizes (NE = 5, 9, 16, 25).
The training and test phases were run for three and two rounds of dataset presentation respectively. The error bars represent the standard deviations of the performances from corresponding repeated runs.
Fig 15
Fig 15. Training performances of ensemble networks using different datasets for each ensemble member (NE = 5).
Individual classifier performances are shown in grey, and the overall ensemble (output layer) performance is shown in red. Results are for various input feature selection schemes on the handwritten integers problem as in the previous section.
Fig 16
Fig 16. Examples of random Gaussian mean placements for different NE from the manually designed initial points (black points).
The red pixels represent the outer border of the active region of the image, and the yellow pixels represent a forbidden region which is 3 pixels thick. The jittered mean points were restricted to be placed inside the inner region (including the green pixels) which is surrounded by the inner border (green).

Similar articles

See all similar articles

Cited by 4 articles


    1. Laubach M, Wessberg J, Nicolelis M. Cortical ensemble activity increasingly predicts behaviour outcomes during learning of a motor task. Nature. 2000;405:567–571. 10.1038/35014604 - DOI - PubMed
    1. Cohen D, Nicolelis M. Reduction of Single-Neuron Firing Uncertainty by Cortical Ensembles during Motor Skill Learning. Journal of Neuroscience. 2004;24(14):3574–3582. 10.1523/JNEUROSCI.5361-03.2004 - DOI - PMC - PubMed
    1. Li W, Howard J, Parrish T, Gottfried J. Aversive Learning Enhances Perceptual and Cortical Discrimination of Indiscriminable Odor Cues. Science. 2008;319:1842–1845. 10.1126/science.1152837 - DOI - PMC - PubMed
    1. O’Reilly RC. Biologically Based Computational Models of High-Level Cognition. Science. 2006;314:91–94. 10.1126/science.1127242 - DOI - PubMed
    1. O’Reilly RC. Modeling integration and dissociation in brain and cognitive development In: Munakata Y, Johnson MH, editors. Processes of Change in Brain and Cognitive Development: Attention and Performance XXI. Oxford: Oxford University Press; 2006. p. 1–22.

Grant support

This work was funded by The European Union as part of EU ICT FET FP7 project INSIGHT: Darwinian Neurodynamics (, grant agreement number 308943. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.