Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 19:13:63.
doi: 10.3389/fninf.2019.00063. eCollection 2019.

CoreNEURON : An Optimized Compute Engine for the NEURON Simulator

Affiliations

CoreNEURON : An Optimized Compute Engine for the NEURON Simulator

Pramod Kumbhar et al. Front Neuroinform. .

Abstract

The NEURON simulator has been developed over the past three decades and is widely used by neuroscientists to model the electrical activity of neuronal networks. Large network simulation projects using NEURON have supercomputer allocations that individually measure in the millions of core hours. Supercomputer centers are transitioning to next generation architectures and the work accomplished per core hour for these simulations could be improved by an order of magnitude if NEURON was able to better utilize those new hardware capabilities. In order to adapt NEURON to evolving computer architectures, the compute engine of the NEURON simulator has been extracted and has been optimized as a library called CoreNEURON. This paper presents the design, implementation, and optimizations of CoreNEURON. We describe how CoreNEURON can be used as a library with NEURON and then compare performance of different network models on multiple architectures including IBM BlueGene/Q, Intel Skylake, Intel MIC and NVIDIA GPU. We show how CoreNEURON can simulate existing NEURON network models with 4-7x less memory usage and 2-7x less execution time while maintaining binary result compatibility with NEURON.

Keywords: NEURON; neuronal networks; performance optimization; simulation; supercomputing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Different execution workflows supported by NEURON simulator with CoreNEURON : (A) shows the existing simulation workflow where HOC/Python interface is used for building a model which is then simulated by NEURON; (B) shows the new CoreNEURON based workflow where the in-memory model constructed by NEURON is transferred using direct memory access and then simulated by CoreNEURON; (C) shows new CoreNEURON based workflow where NEURON partitions a large network model into smaller chunks, iteratively instantiates each model piece in memory, and copies that subset of model information to disk. CoreNEURON then loads the whole model in memory and simulates it.
Figure 2
Figure 2
Simulation workflow with the checkpoint-restart feature : CoreNEURON loads the model from disk, simulate it and dumps in-memory state back to disk (SaveState step). CoreNEURON can load checkpoint data (RestoreState step) and continue the simulation on a different machine using the checkpoint data. The user has flexibility to launch multiple simulations with different stimuli or random number streams (Stim or RNG) in order to explore network stability and robustness.
Figure 3
Figure 3
Dendritic structure and memory layout representation of a neuron: A schematic representation of dendritic structure of a neuron with different mechanisms inserted into the compartment is shown on the left (A). On the right: (B) shows how NEURON and CoreNEURON groups the mechanism instances of the same type; (C) shows how NEURON stores properties of individual mechanism in the AoS layout; (D) shows the new SoA layout in CoreNEURON for storing mechanism properties.
Figure 4
Figure 4
Code generation workflow for CoreNEURON : different phases of the source-to-source compiler are shown in the middle that translates the input model description file (hh.mod) to C++ code (hh.cpp). Compiler hints like ivdep and acc parallel loop are inserted to enable CPU vectorization/GPU parallelization.
Figure 5
Figure 5
Timeline showing the workflow of GPU-enabled CoreNEURON execution. The Model Building and Memory Setup phases are executed on CPU by NEURON and CoreNEURON respectively. The latter performs an in-place memory AoS to SoA transformation and node permutation to optimize Gaussian elimination. The CoreNEURON in-memory model is then copied to GPU memory using OpenACC APIs. All time step integration phases including threshold detection for event generation and event delivery to synapse models take place on the GPU. At the end of each timestep (dt), the generated spike events are transferred to the CPU. Conversely, all the spike events to be delivered during a step are placed in a per-synapse type buffer and transferred at the beginning of each timestep to the GPU. At the end of mindelay interval all spikes destined to other processes are transferred using MPI Communication.
Figure 6
Figure 6
The top row shows three different morphological types with their dendritic tree structure in (A) and dendrograms showing in-memory tree representation of these types in CoreNEURON in (B). The bottom row shows different node ordering schemes to improve the memory access locality on GPUs : (C) Example topologies of three cells with the same number of compartments; (D) Interleaved Layout where a compartment from each of N cells forms an adjacent group of N compartments. For ith node, ni is node index and par[i] is its parent index. With three executor threads, square brace highlight parent indices that result into contiguous memory load (CL) and strided memory load (SL); (E) Constant Depth Layout where all nodes at same depth from root are adjacent; (F) Comparison of two node ordering schemes for Ring network model showing execution time of whole simulation and Gaussian Elimination step.
Figure 7
Figure 7
Memory usage reduction and speedup using CoreNEURON : ratios of memory usage between NEURON and CoreNEURON for different models in Table 1 are shown on the left (measured on BB4 system). Speedups of CoreNEURON simulations compared to NEURON on various architectures (using single node) for the same models are shown on the right.
Figure 8
Figure 8
Strong scaling of CoreNEURON on the BB4 system for two large scale models listed in Table 1: the Cortex+Plasticity model with 219 k neurons (on the left) and the Hippocampus CA1 model with 789 k neurons (on the right).

Similar articles

Cited by

References

    1. Ábrahám E., Bekas C., Brandic I., Genaim S., Johnsen E. B., Kondov I., et al. (2015). Preparing hpc applications for exascale: challenges and recommendations, in 2015 18th International Conference on Network-Based Information Systems (Taipei: ), 401–406.
    1. Akar N. A., Cumming B., Karakasis V., Küsters A., Klijn W., Peyser A., et al. (2019). Arbor — a morphologically-detailed neural network simulation library for contemporary high-performance computing architectures, in 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) (Pavia: ), 274–282.
    1. Anastassiou C. A., Perin R., BuzsÃiki G., Markram H., Koch C. (2015). Cell type- and activity-dependent extracellular correlates of intracellular spiking. J. Neurophysiol. 114, 608–623. 10.1152/jn.00628.2014 - DOI - PMC - PubMed
    1. Arkhipov A., Gouwens N. W., Billeh Y. N., Gratiy S., Iyer R., Wei Z., et al. . (2018). Visual physiology of the layer 4 cortical circuit in silico. PLOS Comput. Biol. 14, 1–47. 10.1371/journal.pcbi.1006535 - DOI - PMC - PubMed
    1. Blundell I., Brette R., Cleland T. A., Close T. G., Coca D., Davison A. P., et al. . (2018). Code generation in computational neuroscience: a review of tools and techniques. Front. Neuroinform. 12:68. 10.3389/fninf.2018.00068 - DOI - PMC - PubMed