Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 13:9:e50804.
doi: 10.7554/eLife.50804.

Energy efficient synaptic plasticity

Affiliations

Energy efficient synaptic plasticity

Ho Ling Li et al. Elife. .

Abstract

Many aspects of the brain's design can be understood as the result of evolutionary drive toward metabolic efficiency. In addition to the energetic costs of neural computation and transmission, experimental evidence indicates that synaptic plasticity is metabolically demanding as well. As synaptic plasticity is crucial for learning, we examine how these metabolic costs enter in learning. We find that when synaptic plasticity rules are naively implemented, training neural networks requires extremely large amounts of energy when storing many patterns. We propose that this is avoided by precisely balancing labile forms of synaptic plasticity with more stable forms. This algorithm, termed synaptic caching, boosts energy efficiency manifold and can be used with any plasticity rule, including back-propagation. Our results yield a novel interpretation of the multiple forms of neural synaptic plasticity observed experimentally, including synaptic tagging and capture phenomena. Furthermore, our results are relevant for energy efficient neuromorphic designs.

Keywords: computational models; metabolism; neuroscience; none; synaptic consolidation; synaptic plasticity.

Plain language summary

The brain expends a lot of energy. While the organ accounts for only about 2% of a person’s bodyweight, it is responsible for about 20% of our energy use at rest. Neurons use some of this energy to communicate with each other and to process information, but much of the energy is likely used to support learning. A study in fruit flies showed that insects that learned to associate two stimuli and then had their food supply cut off, died 20% earlier than untrained flies. This is thought to be because learning used up the insects’ energy reserves. If learning a single association requires so much energy, how does the brain manage to store vast amounts of data? Li and van Rossum offer an explanation based on a computer model of neural networks. The advantage of using such a model is that it is possible to control and measure conditions more precisely than in the living brain. Analysing the model confirmed that learning many new associations requires large amounts of energy. This is particularly true if the memories must be stored with a high degree of accuracy, and if the neural network contains many stored memories already. The reason that learning consumes so much energy is that forming long-term memories requires neurons to produce new proteins. Using the computer model, Li and van Rossum show that neural networks can overcome this limitation by storing memories initially in a transient form that does not require protein synthesis. Doing so reduces energy requirements by as much as 10-fold. Studies in living brains have shown that transient memories of this type do in fact exist. The current results hence offer a hypothesis as to how the brain can learn in a more energy efficient way. Energy consumption is thought to have placed constraints on brain evolution. It is also often a bottleneck in computers. By revealing how the brain encodes memories energy efficiently, the current findings could thus also inspire new engineering solutions.

PubMed Disclaimer

Conflict of interest statement

HL No competing interests declared, Mv Reviewing editor, eLife

Figures

Figure 1.
Figure 1.. Energy efficiency of perceptron learning.
(a) A perceptron cycles through the patterns and updates its synaptic weights until all patterns produce their correct target output. (b) During learning the synaptic weights follow approximately a random walk (red path) until they find the solution (grey region). The energy consumed by the learning corresponds to the total length of the path (under the L1 norm). (c) The energy required to train the perceptron diverges when storing many patterns (red curve). The minimal energy required to reach the correct weight configuration is shown for comparison (green curve). (d) The inefficiency, defined as the ratio between actual and minimal energy plotted in panel c, diverges as well (black curve). The overlapping blue curve corresponds to the theory, Equation 3 in the text.
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. Energy inefficiency as a function of exponent α in the energy function.
The energy inefficiency of perceptron learning for various energy variants. The energy inefficiency of perceptron learning when the energy associated to synaptic update is Σi,t|wi(t)-wi(t-1)|α and the exponent α is varied (green curve). The case α=1 is used throughout the main text. The inefficiency is the ratio between the energy needed to train the perceptron and the energy required to set the weights directly to their final value. When α=0, the energy is equal to the number of updates made. When α=1, the energy is the sum of individual update amounts. When α>1 it costs less energy to make many small weight updates compared to one large one. When α2, this effect is so strong that even the random walk of the perceptron is less costly than directly setting the weights to their final value. We consider 0α1 to be the biologically relevant regime. Also shown is the inefficiency when only potentiation costs energy, and depression comes at no cost that is M=Σi,t[wi(t)-wi(t-1)]+α (overlapping cyan curve). This has virtually identical (in)efficiency.
Figure 2.
Figure 2.. Synaptic caching algorithm.
(a) Changes in the synaptic weights are initially stored in metabolically cheaper transient decaying weights. Here, two example weight traces are shown (blue and magenta). The total synaptic weight is composed of transient and persistent forms. Whenever any of the transient weights exceed the consolidation threshold, the weights become persistent and the transient values are reset (vertical dashed line). The corresponding energy consumed during the learning process consists of two terms: the energy cost of maintenance is assumed to be proportional to the magnitude of the transient weight (shaded area in top traces); energy cost for consolidation is incurred at consolidation events. (b) The total energy is composed of the energy to occasionally consolidate and the energy to support transient plasticity. Here, it is minimal for an intermediate consolidation threshold. (c) The amount of energy required for learning with synaptic caching, in the absence of decay of the transient weights (black curve). When there is no decay and no maintenance cost, the energy equals the minimal one (green line) and the efficiency gain is maximal. As the maintenance cost increases, the optimal consolidation threshold decreases (lower panel) and the total energy required increases, until no efficiency is gained at all by synaptic caching.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. Synaptic caching in a spiking neuron with a biologically plausible perceptron-like learning rule.
To demonstrate the generality of our results, independent of learning rule or implementation, we implement a spiking biophysical perceptron. D'Souza et al. (2010) proposed perceptron-like learning by combining synaptic spike-time dependent plasticity (STDP) with spike-frequency adaptation (SFA). In their model, the leaky integrate-and-fire neuron receives auditory input and delayed visual input. The neuron’s objective is to balance its auditory response A=𝒘𝒙 to its visual response V by adjusting the weights 𝒘 of its auditory synapses through STDP. The visual input is the supervisory signal. We use 100 auditory inputs, and measure the energy for the neuron to learn 𝒘 so that each auditory input pattern becomes associated to a (binary) visual input. We repeatedly present patterns 𝒙(p), each with two activated auditory inputs until 𝒘 stabilized as D’Souza et al. The training is considered successful if the auditory responses of all the input patterns associated to the same binary visual input fall within two standard deviations from the mean auditory response of those patterns, and are at least five standard deviations away from the mean auditory response of other patterns. Synaptic caching is implemented as in the main text by splitting 𝒘 into persistent forms and transient forms. We consider the optimal scenario where the transient weights do not decay and have no maintenance cost. Also in the biophysical implementation of perceptron learning, synaptic caching (green curve) saves a significant amount of energy compared to without caching (red curve), suggesting that synaptic caching works universally regardless of learning algorithm or biophysical implementation.
Figure 3.
Figure 3.. Synaptic caching and decaying transient plasticity.
The amount of energy required, the optimal consolidation threshold, and the learning time as a function of the decay rate of transient plasticity for various values of the maintenance cost. Broadly, stronger decay will increase the energy required and hence reduce efficiency. With weak decay and small maintenance cost, the most energy-saving strategy is to accumulate as many changes in the transient forms as possible, thus increasing the learning time (darker curves). However, when maintenance cost is high, it is optimal to reduce the threshold and hence learning time. Dashed lines denote the results without synaptic caching.
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. The effects of consolidation threshold on energy cost and learning time.
(a) Parametric plot of learning time vs energy while the consolidation threshold θ is varied. The threshold value runs from to 10 in steps of 0.5. For small maintenance costs, the threshold determines a trade-off between either a short learning time or a low energy (e.g. black curve). At higher maintenance costs, the most energy efficient threshold also leads to a short learning time. Average over 100 runs; parameter: τ=103. (b) Similar to the perceptron results in panel a, the effects of consolidation threshold on energy cost and learning time for training in a multi-layer network vary depending on the maintenance cost c. Here, the threshold starts at 0.005 and is in increments of 0.005. When c=0 (black dots, each representing a unique consolidation threshold), there is a trade-off between shorter learning time and lower energy cost. When c=0.001 (red dots), the result is similar to the perceptron result with c=0.01, where optimizing learning time or energy cost leads to a similar threshold. Parameters: η=0.1, τ=104, required accuracy =0.93.
Figure 4.
Figure 4.. Comparison of various variants of the synaptic caching algorithm.
(a) Schematic representation of variants to decide when consolidation occurs. From top to bottom: (1) Consolidation (indicated by the star) occurs whenever transient plasticity at a synapse crosses the consolidation threshold and only that synapse is consolidated. (2) Consolidation of all synapses occurs once transient plasticity at any synapse crosses the threshold. (3) Consolidation of all synapses occurs once the total transient plasticity across synapses crosses the threshold. (b) Energy required to teach the perceptron is comparable across algorithm variants. Consolidation thresholds were optimized for each algorithm and each maintenance cost of transient plasticity individually. In this simulation the transient plasticity did not decay.
Figure 5.
Figure 5.. Energy cost to train a multilayer back-propagation network to classify digits from the MNIST data set.
(a) Energy rises with the accuracy of identifying the digits from a held-out test data. Except for the larger learning rates, the energy is independent of the learning rate η. Inset shows some MNIST examples. (b) Comparison of energy required to train the network with/without synaptic caching, and the minimal energy. As for the perceptron and depending on the cost of transient plasticity, synaptic caching can reduce energy need manifold. (c) There is an optimal number of hidden units that minimizes metabolic cost. Both with and without synaptic caching, energy needs are high when the number of hidden units is barely sufficient or very large. Parameters for transient plasticity in (b) and (c): η=0.1, τ=1000, c=0.001.
Figure 6.
Figure 6.. Maintenance and consolidation power.
Power (energy per epoch) of the perceptron vs epoch. Solid curves are from simulation, dashed curves are the theoretical predictions, Equations 6 and 7, with σ calculated by using the perceptron update rate p extracted from the simulation. Both powers are well described by the theory. Parameters: τ=500, c=0.01, θ=5.

Similar articles

Cited by

References

    1. Alle H, Roth A, Geiger JR. Energy-efficient action potentials in hippocampal mossy fibers. Science. 2009;325:1405–1408. doi: 10.1126/science.1174331. - DOI - PubMed
    1. Attwell D, Laughlin SB. An energy budget for signaling in the grey matter of the brain. Journal of Cerebral Blood Flow & Metabolism. 2001;21:1133–1145. doi: 10.1097/00004647-200110000-00001. - DOI - PubMed
    1. Azari NP. Effects of glucose on memory processes in young adults. Psychopharmacology. 1991;105:521–524. doi: 10.1007/BF02244373. - DOI - PubMed
    1. Barrett AB, Billings GO, Morris RG, van Rossum MC. State based model of long-term potentiation and synaptic tagging and capture. PLOS Computational Biology. 2009;5:e1000259. doi: 10.1371/journal.pcbi.1000259. - DOI - PMC - PubMed
    1. Brea J, Urbanczik R, Senn W. A normative theory of forgetting: lessons from the fruit fly. PLOS Computational Biology. 2014;10:e1003640. doi: 10.1371/journal.pcbi.1003640. - DOI - PMC - PubMed

Publication types