Finding Structure in Time
Citations
72,897 citations
Cites methods from "Finding Structure in Time"
...The approaches of Elman (1988), Fahlman (1991), Williams (1989), Schmidhuber (1992a), Pearlmutter (1989), and many of the related algorithms in Pearlmutter’s comprehensive overview (1995) suffer from the same problems as BPTT and RTRL (see sections 1 and 3)....
[...]
...In experimental comparisons with RTRL, BPTT, Recurrent Cascade-Correlation, Elman nets, and Neural Sequence Chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex long time lag tasks that have never been solved by previous recurrent network algorithms. It works with local, distributed, real-valued, and noisy pattern representations. 1 INTRODUCTION Recurrent networks can in principle use their feedback connections to store representations of recent input events in form of activations (\short-term memory", as opposed to \long-term memory" embodied by slowly changing weights). This is potentially signi cant for many applications, including speech processing, non-Markovian control, and music composition (e.g., Mozer 1992). The most widely used algorithms for learning what to put in short-term memory, however, take too much time or don't work well at all, especially when minimal time lags between inputs and corresponding teacher signals are long. With conventional \Back-Propagation Through Time" (BPTT, e.g., Williams and Zipser 1992) or \Real-Time Recurrent Learning" (RTRL, e.g., Robinson and Fallside 1987), error signals \ owing backwards in time" tend to either (1) blow up or (2) vanish: the temporal evolution of the backpropagated error exponentially depends on the size of the weights. Case (1) may lead to oscillating weights, while in case (2) learning to bridge long time lags takes a prohibitive amount of time, or does not work at all. This paper presents \Long Short-Term Memory" (LSTM), a novel recurrent network architecture in conjunction with an appropriate gradient-based learning algorithm. The combination of both is designed to overcome these error back- ow problems. Unlike Schmidhuber's (1992b) chunking systems (which work well if input sequences contain local regularities that make them partly predictable), LSTM can learn to bridge time intervals in excess of 1000 steps even in noisy, 1...
[...]
20,077 citations
14,635 citations
Cites result from "Finding Structure in Time"
...In fact, some popular RNN algorithms restricted credit assignment to a single step backwards (Elman, 1988; Jordan, 1986), also in more recent studies (Jaeger, 2002; Maass et al....
[...]
...In fact, some popular RNN algorithms restricted credit assignment to a single step backwards (Elman, 1990; Jordan, 1986, 1997), also inmore recent studies (Jaeger, 2001, 2004; Maass et al., 2002)....
[...]
7,119 citations
6,832 citations
References
17,604 citations
"Finding Structure in Time" refers methods in this paper
...If so, the output is compared with a teacher input and backpropagation of error ( Rumelhart, Hinton, & Williams, 1986 ) is used to incrementally adjust connection strengths....
[...]
15,313 citations
13,579 citations
12,586 citations
"Finding Structure in Time" refers background in this paper
...In addition, it has been argued that generalizations about word order cannot be accounted for solely in terms of linear order (Chomsky, 1957, 1965)....
[...]
...In addition, it has been argued that generalizations about word order cannot be accounted for solely in terms of linear order (Chomsky, 1957; Chomsky, 1965)....
[...]
12,225 citations