Neuromorphic spike-based large language model

Natl Sci Rev. 2025 Dec 4;13(4):nwaf551. doi: 10.1093/nsr/nwaf551. eCollection 2026 Feb.

Abstract

This work proposes a unified neuromorphic spike-based large-language-model (NSLLM) framework to simultaneously address the challenges of high energy consumption and low interpretability in LLMs. Our framework transforms LLMs into efficient NSLLMs by converting their behaviors into neural dynamics-such as spike trains-through rigorous mathematical modeling and complemented by advanced techniques including quantization and sparsification. This transformation also enables the analysis of information encoding processes using computational neuroscience tools, thereby offering a novel neuroscientific perspective that conceptualizes LLMs as neural populations to enhance their interpretability. Leveraging a hardware-algorithm co-design paradigm, an NSLLM can completely eliminate matrix multiplication (MatMul) while maintaining high performance. We designed a custom MatMul-free hardware core on the VCK190 field-programmable gate array to validate the 1.5-billion-parameter NSLLM, achieving a dynamic power consumption of only 13.849 W and an inference throughput of 161.8 tokens per second. Compared with the A800 GPU, this implementation improves energy efficiency, memory usage and inference throughput by 19.8[Formula: see text], 21.3[Formula: see text] and 2.2[Formula: see text], respectively. This work provides a novel perspective within a unified framework to enhance both the energy efficiency and interpretability of LLMs, offering valuable insights for future neuromorphic chip designs tailored for large models.

Keywords: interdisciplinary neuroscience; neuromorphic computing; spike-based LLM; spiking linear attention.