Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec;25(3):439-48.
doi: 10.1007/s10827-008-0087-5. Epub 2008 Apr 1.

Fully implicit parallel simulation of single neurons

Affiliations

Fully implicit parallel simulation of single neurons

Michael L Hines et al. J Comput Neurosci. 2008 Dec.

Abstract

When a multi-compartment neuron is divided into subtrees such that no subtree has more than two connection points to other subtrees, the subtrees can be on different processors and the entire system remains amenable to direct Gaussian elimination with only a modest increase in complexity. Accuracy is the same as with standard Gaussian elimination on a single processor. It is often feasible to divide a 3-D reconstructed neuron model onto a dozen or so processors and experience almost linear speedup. We have also used the method for purposes of load balance in network simulations when some cells are so large that their individual computation time is much longer than the average processor computation time or when there are many more processors than cells. The method is available in the standard distribution of the NEURON simulation program.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A: Kinetic scheme for the example neuron. B: Scheme is divided into 8 subtrees connected at 4 distinct compartments. C: Subtrees separated from each other each have some terms from the original shared compartment and are connected together by 0 resistance virtual wires.
Figure 2
Figure 2
The sequence of steps that carry out Gaussian elimination of the partitioned neuron in Figure 1C. A: Starting structure prior to any elimination steps. B: After phase1, single connection point subtrees are fully triangularized and two connection point subtrees are triangularized up to the backbone path. C: After phase 2, tridiagonal backbone paths are transformed so that all backbone compartments depend only on the end compartments. D: At the start of phase 3, a reduced tree is constructed. Here it consists of 4 coupled equations. The patterns inside each circle are meant to indicate that the d and b matrix elements are the sums of the corresponding elements from the connecting subtrees. At the end of phase three, the voltages are sent back to the subtrees. E: After phase 3, all subtrees are fully triangularized. F: After phase 4, all voltages along the backbone paths are known. In phase 5, not illustrated, the backsubstitution is completed.
Figure 3
Figure 3
Left panel: Lazarewicz model CA3 pyramidal cell is divided into 15 pieces with 7 distinct connection points. Middle panel: Complexity of each of the pieces ordered from greatest to least. Right panel: LPT algorithm chooses a processor for each piece based on the processor with the least cumulative complexity.
Figure 4
Figure 4
Complexity histograms for the 10000 cell model and when the cells are split into 16694 pieces suitable for load balanced simulation on 8192 processors. Maximum cell complexity is 42,385 and total network complexity is 92,360,610. The largest cell was split into 8 pieces with a reduced-tree matrix rank of 5. LPT load imbalance is 5%. Maximum piece size during splitting was chosen using 0.8*total complexity/nhost.
Figure 5
Figure 5
Performance as a function of number of processors for three single cell models. The x86_64 machine is the 4 processor Dell. The ia64 machine is the 32 processor SGI. For the latter, the last two points are for 24 and 30 processors. Filled circles are runtime in seconds. Open circles are average computation time. The dashed line is the “ideal” (−1) slope performance scaling in this log2 vs log2 plot relative to the one processor computation time. Open circles contain a vertical line, usually too short to be visible, that spans the minimum to maximum computation time for the processors that took part in each simulation. The dash mark for 30-processor runs is the best runtime as maximum piece size was varied.
Figure 6
Figure 6
Performance as a function of number of processors for the 356 cell Traub model. The style is the same as that of Figure 4 except that the filled squares are the runtime with gap junctions included. At 256 processors there is a switch from whole cell balance to the multisplit method.
Figure 7
Figure 7
Performance as a function of number of processors for the 10000 cell model. Style is the same as that of Figure 4. Filled squares are for whole cell balancing.

Similar articles

Cited by

References

    1. Hayes B. The easiest hard problem. American Scientist. 2002;90(Number 2):113–117.
    1. Heglund M. On the parallel solution of tridiagonal systems by wraparound partitioning and incomplete LU factorization. Numer. Math. 1991;59:453–472.
    1. Hindmarsh A, Serban R. Tech. rep. Lawrence Livermore National Laboratory; 2002. User documentation for CVODES, An ODE solver with sensitiviy analysis capabilities. http://www.llnl.gov/CASC/sundials/
    1. Hines ML, Carnevale NT. The NEURON simulation environment. Neural Comput. 1997;9:1179–1209. - PubMed
    1. Hines M, Eichner H, Schürmann F. Neuron splitting in compute-bound parallel network simulations enables runtime scaling with twice as many processors. J. Comput. Neurosci. 2008:18214662. PMID. - PMC - PubMed

Publication types

LinkOut - more resources