Neural Machine Translation by Jointly Learning to Align and Translate
Citations
12,299 citations
11,936 citations
Cites methods or result from "Neural Machine Translation by Joint..."
...We were initially convinced that the LSTM would fail on long sentences due to its limited memory, and other researchers reported poor performance on long sentences with a model similar to ours [5, 2, 26]....
[...]
...[2] also attempted direct translations with a neural network that used an attention mechanism to overcome the poor performance on long sentences experienced by Cho et al....
[...]
...This way of evaluating the BELU score is consistent with [5] and [2], and reproduces the 33....
[...]
8,055 citations
Cites background or methods or result from "Neural Machine Translation by Joint..."
...On the other hand, in (Bahdanau et al., 2015; Jean et al., 2015) and this work, s, in fact, implies a set of source hidden states which are consulted throughout the entire course of the translation process....
[...]
...Comparison to other work – Bahdanau et al. (2015) use context vectors, similar to our ct, in building subsequent hidden states, which can also achieve the “coverage” effect....
[...]
...…part of at and for long sentences, we ignore words near the end. goes through a deep-output and a maxout layer before making predictions.7 Lastly, Bahdanau et al. (2015) only experimented with one alignment function, the concat product; whereas we show later that the other alternatives are…...
[...]
...The former approach resembles the model of (Bahdanau et al., 2015) but is simpler architecturally....
[...]
...Bahdanau et al. (2015), on the other hand, use the concatenation of the forward and backward source hidden states in the bi-directional encoder and target hidden states in their non-stacking uni-directional decoder....
[...]
7,019 citations
Cites background or methods from "Neural Machine Translation by Joint..."
...Most competitive neural sequence transduction models have an encoder-decoder structure [5, 2, 29]....
[...]
...This mimics the typical encoder-decoder attention mechanisms in sequence-to-sequence models such as [31, 2, 8]....
[...]
...The two most commonly used attention functions are additive attention [2], and dot-product (multiplicative) attention....
[...]
...Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences [2, 16]....
[...]
...Recurrent neural networks, long short-term memory [12] and gated recurrent [7] neural networks in particular, have been firmly established as state of the art approaches in sequence modeling and transduction problems such as language modeling and machine translation [29, 2, 5]....
[...]
6,953 citations
Cites background or methods from "Neural Machine Translation by Joint..."
...Self-attention is a variant of attention [Graves, 2013; Bahdanau et al., 2015] that processes a sequence by replacing each element by a weighted average of the rest of the sequence....
[...]
...Similar arguments have been made against using a unidirectional recurrent neural network encoder in sequence-to-sequence models [Bahdanau et al., 2015]....
[...]
References
162 citations
"Neural Machine Translation by Joint..." refers methods in this paper
..., Graves, 2012; Boulanger-Lewandowski et al., 2013). Sutskever et al. (2014) used this approach to generate translations from their neural machine translation model....
[...]
...Following the procedure described in Cho et al. (2014a), we reduce the size of the combined corpus to have 348M words using the data selection method by Axelrod et al....
[...]
...Once a model is trained, we use a beam search to find a translation that approximately maximizes the conditional probability (see, e.g., Graves, 2012; Boulanger-Lewandowski et al., 2013)....
[...]
157 citations
"Neural Machine Translation by Joint..." refers methods in this paper
...b-component of the existing translation system. Traditionally, a neural network trained as a target-side language model has been used to rescore or rerank a list of candidate translations (see, e.g., Schwenk et al., 2006). Although the above approaches were shown to improve the translation performance over the stateof-the-art machine translation systems, we are more interested in a more ambitious objective of designin...
[...]
...Traditionally, a neural network trained as a target-side language model has been used to rescore or rerank a list of candidate translations (see, e.g., Schwenk et al., 2006)....
[...]
142 citations
129 citations
"Neural Machine Translation by Joint..." refers background or methods in this paper
...Sutskever et al. (2014) reported that the neural machine translation based on RNNs with long shortterm memory (LSTM) units achieves close to the state-of-the-art performance of the conventional phrase-based machine translation system on an English-to-French translation task....
[...]
...For instance, Schwenk (2012) proposed using a feedforward neural network to compute the score of a pair of source and target phrases and to use the score as an additional feature in the phrase-based statistical machine translation system....
[...]
71 citations
"Neural Machine Translation by Joint..." refers methods in this paper
...We conjectured that the use of a fixed-length context vector is problematic for translating long sentences, based on a recent empirical study reported by Cho et al. (2014b) and Pouget-Abadie et al. (2014)....
[...]