Long short-term memory
- PMID: 9377276
- DOI: 10.1162/neco.1997.9.8.1735
Long short-term memory
Abstract
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
Similar articles
-
Learning to forget: continual prediction with LSTM.Neural Comput. 2000 Oct;12(10):2451-71. doi: 10.1162/089976600300015015. Neural Comput. 2000. PMID: 11032042
-
Framewise phoneme classification with bidirectional LSTM and other neural network architectures.Neural Netw. 2005 Jun-Jul;18(5-6):602-10. doi: 10.1016/j.neunet.2005.06.042. Neural Netw. 2005. PMID: 16112549
-
Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets.Neural Netw. 2003 Mar;16(2):241-50. doi: 10.1016/S0893-6080(02)00219-8. Neural Netw. 2003. PMID: 12628609
-
A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures.Neural Comput. 2019 Jul;31(7):1235-1270. doi: 10.1162/neco_a_01199. Epub 2019 May 21. Neural Comput. 2019. PMID: 31113301 Review.
-
Working models of working memory.Curr Opin Neurobiol. 2014 Apr;25:20-4. doi: 10.1016/j.conb.2013.10.008. Epub 2013 Dec 4. Curr Opin Neurobiol. 2014. PMID: 24709596 Review.
Cited by
-
TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines.BMC Med Inform Decis Mak. 2024 Oct 24;24(1):310. doi: 10.1186/s12911-024-02717-7. BMC Med Inform Decis Mak. 2024. PMID: 39444035 Free PMC article.
-
Multi-Class Detection of Neurodegenerative Diseases from EEG Signals Using Lightweight LSTM Neural Networks.Sensors (Basel). 2024 Oct 19;24(20):6721. doi: 10.3390/s24206721. Sensors (Basel). 2024. PMID: 39460201 Free PMC article.
-
A Novel Hybrid Method to Predict PM2.5 Concentration Based on the SWT-QPSO-LSTM Hybrid Model.Comput Intell Neurosci. 2022 Aug 16;2022:7207477. doi: 10.1155/2022/7207477. eCollection 2022. Comput Intell Neurosci. 2022. PMID: 36017460 Free PMC article.
-
CID-GCN: An Effective Graph Convolutional Networks for Chemical-Induced Disease Relation Extraction.Front Genet. 2021 Feb 10;12:624307. doi: 10.3389/fgene.2021.624307. eCollection 2021. Front Genet. 2021. PMID: 33643385 Free PMC article.
-
Balancing Flexibility and Interference in Working Memory.Annu Rev Vis Sci. 2021 Sep 15;7:367-388. doi: 10.1146/annurev-vision-100419-104831. Epub 2021 Jun 3. Annu Rev Vis Sci. 2021. PMID: 34081535 Free PMC article. Review.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
