Neural Machine Translation by Jointly Learning to Align and Translate
Citations
2,466 citations
Cites methods or result from "Neural Machine Translation by Joint..."
...(Bahdanau et al., 2014) first applied the attention mechanism to machine translation improving the performance, especially for long sequences....
[...]
...Similar problems have also been reported in machine translation (Bahdanau et al., 2014)....
[...]
...[94] first applied the attention mechanism to machine translation, which improved the performance especially for long sequences....
[...]
2,452 citations
Cites methods from "Neural Machine Translation by Joint..."
...We use additive attention [2] to obtain the gating coefficient....
[...]
...For instance, additive soft attention is used in sentence-to-sentence translation [2, 29] and more recently applied to image classification [11, 32]....
[...]
...Attention Gates: AGs are commonly used in natural image analysis, knowledge graphs, and language processing (NLP) for image captioning [1], machine translation [2, 30], and classification [11, 31, 32] tasks....
[...]
2,353 citations
Cites background from "Neural Machine Translation by Joint..."
...able the learning of long-term dependency (Bahdanau et al., 2014; Vaswani et al., 2017)....
[...]
...On the other hand, the direct connections between long-distance word pairs baked in attention mechanisms might ease optimization and enable the learning of long-term dependency (Bahdanau et al., 2014; Vaswani et al., 2017)....
[...]
2,339 citations
2,320 citations
Cites background from "Neural Machine Translation by Joint..."
..., 2013), statistical machine translation (Devlin et al., 2014; Sutskever et al., 2014; Bahdanau et al., 2015), Atari and Go games (Mnih et al....
[...]
...…et al., 2014), speech recognition (Hinton et al., 2012; Sainath et al., 2013), sta- tistical machine translation (Devlin et al., 2014; Sutskever et al., 2014; Bahdanau et al., 2015), Atari and Go games (Mnih et al., 2015; Silver et al., 2016), and even abstract art (Mordvintsev et al., 2015)....
[...]
References
72,897 citations
"Neural Machine Translation by Joint..." refers methods in this paper
...This gated unit is similar to a long short-term memory (LSTM) unit proposed earlier by Hochreiter and Schmidhuber (1997), sharing with it the ability to better model and learn long-term dependencies....
[...]
19,998 citations
7,309 citations
"Neural Machine Translation by Joint..." refers background or methods in this paper
...Since Bengio et al. (2003) introduced a neural probabilistic language model which uses a neural network to model the conditional probability of a word given a fixed number of the preceding words, neural networks have widely been used in machine translation. However, the role of neural networks has been largely limited to simply providing a single feature to an existing statistical machine translation system or to re-rank a list of candidate translations provided by an existing system. For instance, Schwenk (2012) proposed using a feedforward neural network to compute the score of a pair of source and target phrases and to use the score as an additional feature in the phrase-based statistical machine translation system. More recently, Kalchbrenner and Blunsom (2013) and Devlin et al. (2014) reported the successful use of the neural networks as a sub-component of the existing translation system....
[...]
...These paths allow gradients to flow backward easily without suffering too much from the vanishing effect (Hochreiter, 1991; Bengio et al., 1994; Pascanu et al., 2013a)....
[...]
...Since Bengio et al. (2003) introduced a neural probabilistic language model which uses a neural network to model the conditional probability of a word given a fixed number of the preceding words, neural networks have widely been used in machine translation....
[...]
...Since Bengio et al. (2003) introduced a neural probabilistic language model which uses a neural network to model the conditional probability of a word given a fixed number of the preceding words, neural networks have widely been used in machine translation. However, the role of neural networks has been largely limited to simply providing a single feature to an existing statistical machine translation system or to re-rank a list of candidate translations provided by an existing system. For instance, Schwenk (2012) proposed using a feedforward neural network to compute the score of a pair of source and target phrases and to use the score as an additional feature in the phrase-based statistical machine translation system....
[...]
7,290 citations
"Neural Machine Translation by Joint..." refers methods in this paper
...With this new approach the information can be spread throughout the sequence of annotations, which can be selectively retrieved by the decoder accordingly....
[...]
...Hence, we propose to use a bidirectional RNN (BiRNN, Schuster and Paliwal, 1997), which has been successfully used recently in speech recognition (see, e.g., Graves et al., 2013)....
[...]
6,832 citations