Neural Machine Translation by Jointly Learning to Align and Translate
Citations
5 citations
Cites methods from "Neural Machine Translation by Joint..."
...A good candidate for this sequence model is Long Short-Term Memory (LSTM) [18] given its recent success in difficult sequence modeling tasks [20, 21]....
[...]
5 citations
Cites methods from "Neural Machine Translation by Joint..."
...We use feed-forward attention (Bahdanau et al., 2014) which encapsulates a learnable layer....
[...]
5 citations
Cites background from "Neural Machine Translation by Joint..."
...Furthermore, the decoder component also improves the performance by predicting the future five time steps with one model, which is better than the ANN and LSTM model trained separately on five models for multi-step prediction....
[...]
...Recently, the LSTM network [21] has also been applied for predicting the occupancy states of the spectrum due to its advantage in modeling sequential data with a recurrent unit....
[...]
...(3) Thus the encoder can recurrently update the hidden variable given input series with ht = fe(ht−1, x̃t), where fe is the LSTM unit used in encoder....
[...]
...The main contributions are summarized as follows: • An RSD algorithm via image processing is proposed to locate the signal from the spectrogram considering temporal and frequency domain features, which shows robustness in accurately detecting signal from frequency bands with varied SNRs. • We develop the TF2AN for precise spectrum prediction, which models the complex temporal-frequency correlation of radio spectrum with an attention-based Long Short-term Memory (LSTM) network....
[...]
...Thus, the methods with encoder-decoder structure (i.e., Seq2seq, Attention LSTM, DA-RNN and TF2AN) usually achieve a better performance than other methods because of considering much longer temporal correlation....
[...]
5 citations
5 citations
References
72,897 citations
"Neural Machine Translation by Joint..." refers methods in this paper
...This gated unit is similar to a long short-term memory (LSTM) unit proposed earlier by Hochreiter and Schmidhuber (1997), sharing with it the ability to better model and learn long-term dependencies....
[...]
19,998 citations
7,309 citations
"Neural Machine Translation by Joint..." refers background or methods in this paper
...Since Bengio et al. (2003) introduced a neural probabilistic language model which uses a neural network to model the conditional probability of a word given a fixed number of the preceding words, neural networks have widely been used in machine translation. However, the role of neural networks has been largely limited to simply providing a single feature to an existing statistical machine translation system or to re-rank a list of candidate translations provided by an existing system. For instance, Schwenk (2012) proposed using a feedforward neural network to compute the score of a pair of source and target phrases and to use the score as an additional feature in the phrase-based statistical machine translation system. More recently, Kalchbrenner and Blunsom (2013) and Devlin et al. (2014) reported the successful use of the neural networks as a sub-component of the existing translation system....
[...]
...These paths allow gradients to flow backward easily without suffering too much from the vanishing effect (Hochreiter, 1991; Bengio et al., 1994; Pascanu et al., 2013a)....
[...]
...Since Bengio et al. (2003) introduced a neural probabilistic language model which uses a neural network to model the conditional probability of a word given a fixed number of the preceding words, neural networks have widely been used in machine translation....
[...]
...Since Bengio et al. (2003) introduced a neural probabilistic language model which uses a neural network to model the conditional probability of a word given a fixed number of the preceding words, neural networks have widely been used in machine translation. However, the role of neural networks has been largely limited to simply providing a single feature to an existing statistical machine translation system or to re-rank a list of candidate translations provided by an existing system. For instance, Schwenk (2012) proposed using a feedforward neural network to compute the score of a pair of source and target phrases and to use the score as an additional feature in the phrase-based statistical machine translation system....
[...]
7,290 citations
"Neural Machine Translation by Joint..." refers methods in this paper
...With this new approach the information can be spread throughout the sequence of annotations, which can be selectively retrieved by the decoder accordingly....
[...]
...Hence, we propose to use a bidirectional RNN (BiRNN, Schuster and Paliwal, 1997), which has been successfully used recently in speech recognition (see, e.g., Graves et al., 2013)....
[...]
6,832 citations