Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 31:9:19.
doi: 10.3389/fninf.2015.00019. eCollection 2015.

ANNarchy: a code generation approach to neural simulations on parallel hardware

Affiliations

ANNarchy: a code generation approach to neural simulations on parallel hardware

Julien Vitay et al. Front Neuroinform. .

Abstract

Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions.

Keywords: Python; code generation; neural simulator; parallel computing; rate-coded networks; spiking networks.

PubMed Disclaimer

Figures

Figure 1
Figure 1
ANNarchy script reproducing the pulse-coupled spiking network described in Izhikevich (2003). A population of 1000 Izhikevich neurons is created and split into subsets of 800 excitatory and 200 inhibitory neurons. The different parameters of the Izhikevich neuron are then initialized through attributes of the two populations. a, b, c, and d are dimensionless parameters, noise is a multiplicative factor on the random variable Normal(0., 1.) drawn each step from the standard normal distribution 𝒩(0,1), v_thresh is the spiking theshold of the neurons and tau is the time constant in milliseconds of the membrane conductances. The network is fully connected, with weight values initialized randomly using uniform distributions whose range depend on the pre-synaptic population. The source code for the network is then generated, compiled and simulated for 1000 ms.
Figure 2
Figure 2
Examples of rate-coded neuron and synapse definitions. (A) Noisy leaky-integrator rate-coded neuron. It defines a global parameter tau for the time constant and a local one B for the baseline firing rate. The evolution of the firing rate r over time is rules by an ODE integrating the weighted sum of excitatory inputs sum(exc) and the baseline. The random variable is defined by the Uniform(–1.0, 1.0) term, so that a value is taken from the uniform range [−1, 1] at each time step and for each neuron. The initial value at t = 0 of r is set to 1.0 through the init flag and the minimal value of r is set to zero. (B) Rate-coded synapse implementing the IBCM learning rule. It defines a global parameter tau, which is used to compute the sliding temporal mean of the square of the post-synaptic firing rate in the variable theta. This variable has the flag postsynaptic, as it needs to be computed only once per post-synaptic neuron. The connection weights w are then updated according to the IBCM rule and limited to positive values through the min=0.0 flag.
Figure 3
Figure 3
Examples of spiking neuron and synapse definitions. (A) Izhikevich neuron. The parameters and equations fields follow the same principles as for rate-coded neurons. The variable I gathers the inputs to the neuron, namely the sum of the excitatory g_exc and inhibitory g_inh input currents and a constant current i_offset. The membrane potential v and the recovery variable u are updated according to the desired dynamics, with initial values specified with the init keyword. The spike field defines the condition for emitting a spike, here when the membrane potential v exceeds the threshold v_thresh. The reset field specifies the modifications happening after a spike is emitted. Here the membrane potential is clamped to the value c and the recovery variable u is incremented by d. The refractory period is determined by the refractory field, here 2 ms. (B) Short-term plasticity (STP) synapse. For this synapse, the increment of the post-synaptic conductance g_target when a pre-synaptic spike arrives depends not only on the synaptic efficiency w, but also on the value of variables internal to the synapse x and u. These are updated through two mechanisms: the equations field specifies their exponentially-decreasing dynamics, while the pre_spike defines their increments when a pre-synaptic spike arrives at the synapse. However, the integration of the corresponding ODEs is event-driven through the use of the event-driven flag: when a pre- or post-synaptic spikes occurs, the new value of these variables is directly computed using the analytical solution of the ODE. This can speed up the simulation if the number of spiking events is low. (C) Spike-timing dependent plasticity (STDP) synapse. For this synapse, the post-synaptic conductance is increased by w after a pre-synaptic spike is received, but the synaptic efficiency is adapted depending on two internal variables Apre and Apost. The pre_spike field states what should happen when a pre-synaptic spike arrives at the synapse, while the post_spike field describes the changes occuring when the post-synaptic neuron fires. The variables Apre and Apost are integrated in an event-driven manner. The clip() function is used to maintain w in the range [0, w_max]. (D) NMDA non-linear synapse. This synapse does not transmit information to the post-synaptic neuron in an event-driven manner. Rather, the synaptic variable g is summed at each time step by the post-synaptic neuron, as for rate-coded networks. This is specified by the psp field. When a pre-synaptic spike occurs, the variable x is increased by w, which in turn will modify the evolution of g through the coupled equations described in the equations field. These equations cannot be solved with the event-driven method, as their values should be available at each time step.
Figure 4
Figure 4
Example of an hybrid network encoding a rate-coded population into a spiking population (PoissonPopulation) and decoded back to the rate-coded domain (DecodingProjection). The script for this plot is provided in the Supplementary Material. (A) Raster plot of the spiking population reacting to step-wise inputs for 1 s. Each step lasts 250 ms (0, 10, 50, and 100 Hz). (B) Firing rate of a single rate-coded neuron decoding the corresponding spiking neuron. The blue line shows the firing rate in the input population and the green line shows the decoded firing rate. It follows the original firing rate with some noise due to the stochastic nature of the spike trains and some delay due to the integration window. (C) Relative decoding error (ϵ=1250t=0250|r(t)F|Fdt) depending on the number of spiking neurons used for decoding, for different input firing rates (10, 50, and 100 Hz). For small number of neurons, the decoding error is high as individual spike trains are stochastic. When the number of neurons is increased (over 200), the decoding error is reduced. Decoding is relatively more precise at high frequencies than at low ones.
Figure 5
Figure 5
Example of code generated for the Equation (13) using different numerical methods: 1. Explicit Euler; 2. Implicit Euler; 3. Exponential Euler; 4. Midpoint (Runge-Kutta method of order 2). pop0 is a C++ structure holding the different attributes of the population: the vectors v and u for the two variables, the vector g_exc for the excitatory inputs and the double value tau for the time constant. All methods compute first the increments _v and _u before adding them to v and u, in order to make sure the update rules use the previous values of these variables. The number of elementary operations differs from one method to another, increasing the simulation runtime, but the numerical precision and stability of themore complex methods might be required in some cases.
Figure 6
Figure 6
Code generated for a single population pop0 of 1000 identical neurons. (A) Neuron model used for code generation: a global parameter tau and a local variable r following a linear ODE and limited to positive values. (B) Code generated for the OpenMP framework. The code is pasted into the main C++ code ANNarchy.cpp and called at each step. It iterates over the 1000 neurons of the population and updates their firing rate depending on the corresponding code snippet. It operates directly on the data contained in the structure pop0. A simple #pragma statement allows parallel processing over the available threads. (C) Code generated for the CUDA framework. The code is pasted into the specific ANNarchy.cu file. A copy of the vectors _sum_exc and r (prefixed by gpu) is sent to the device (GPU) through the call to cuPop0_step by the host (CPU). The code inside cuPop0_step is executed in parallel on the device for the 1000 neurons and updates the array corresponding to r. This copy of r is transfered back to the CPU at the end of the simulation block for analysis in Python. Note that the parser can be configured to not generate the struct prefixes as for the OpenMP backend.
Figure 7
Figure 7
Speedup ratio obtained by ANNarchy for a fully connected rate-coded network composed of two populations of 1000 (resp. 4000) neurons each. The speedup ratio is defined by the ratio between the execution time (measured for a simulation of 1 s) of the single-threaded implementation and the one measured when using T threads. The single-threaded implementation does not use OpenMP nor CUDA primitives. For the OpenMP implementation, the number of threads is varied between 2 and 12. For the CUDA implementation, the default configuration of ANNarchy (32 threads for the neural variables updates, 192 threads for the weighted sums) is used. The CUDA implementation is run on a different machine for technical reasons, so the single-threaded baseline measured on this machine differs from the one used for OpenMP. Nevertheless, only the scaling ratio is interesting here, not the absolute execution times. The black line denotes the ideal linear scaling, the blue line the scaling of the network with 1000 neurons, the green one the scaling for 4000 neurons. With OpenMP, the scaling for 1000 neurons is slightly sub-optimal, while the one for 4000 neurons saturates quickly at a ratio of 2.9. The situation is reversed with CUDA: the network with 1000 neurons only achieves a speedup ratio of 3.8, while the network with 4000 neurons achieves a ratio of 7.15.
Figure 8
Figure 8
Comparison of the simulation times of different simulators depending on the number of threads on a shared-memory system. The parallel performance of the simulators Brian (version 1.4.1), Brian 2 (version 2.0b3), NEST (with Python bindings, version 2.4.2), Auryn (version 0.4.1), and ANNarchy (version 4.4.0) are investigated up to 12 threads. Two versions of NEST are used: one using the Runge-Kutta-Fehlberg 4(5) method (noted NEST-RK45), and a patched version using the explicit Euler method (NEST-Euler). The simulation times are normalized to show the real-time ratio: a normalized time of 1 means that simulating the network for 1 s takes exactly 1 s of computer time (simulations are run for 10 s). Both axes use a logarithmic scale. Brian only allows single-threaded simulations. Brian 2, NEST and ANNarchy use OpenMP, while Auryn uses MPI (openMPI 1.4.3). Auryn only allows a number of processes which is a multiple of 2. The single-threaded version of ANNarchy compares well to other neural simulators, but its scaling properties are not optimal compared to NEST.

Similar articles

Cited by

References

    1. Aisa B., Mingus B., O'Reilly R. (2008). The emergent neural modeling system. Neural Netw. 21, 1146–1152. 10.1016/j.neunet.2008.06.016 - DOI - PubMed
    1. Bednar J. A. (2009). Topographica: building and analyzing map-level simulations from python, C/C++, MATLAB, NEST, or NEURON components. Front. Neuroinform. 3:8. 10.3389/neuro.11.008.2009 - DOI - PMC - PubMed
    1. Behnel S., Bradshaw R. W., Seljebotn D. S. (2009). Cython tutorial, in Proceedings 8th Python Science Conference, eds Varoquaux G., van der Walt S., Millman J. (Pasadena, CA), 4–14.
    1. Bekolay T., Bergstra J., Hunsberger E., Dewolf T., Stewart T. C., Rasmussen D., et al. . (2014). Nengo: a Python tool for building large-scale functional brain models. Front. Neuroinform. 7:48. 10.3389/fninf.2013.00048 - DOI - PMC - PubMed
    1. Beuth F., Hamker F. H. (2015). A mechanistic cortical microcircuit of attention for amplification, normalization and suppression. Vision Res. [Epub ahead of print]. 10.1016/j.visres.2015.04.004 - DOI - PubMed