Neural Machine Translation by Jointly Learning to Align and Translate
Citations
4,075 citations
3,776 citations
Cites background from "Neural Machine Translation by Joint..."
...Recent advances in machine learning and deep neural networks enabled researchers to solve multiple important practical problems like image, video, text classification and others (Krizhevsky et al., 2012; Hinton et al., 2012; Bahdanau et al., 2015)....
[...]
...ne learning and deep neural networks enabled researchers to solve multiple important practical problems like image, video, text classification and others (Krizhevsky et al., 2012; Hinton et al., 2012; Bahdanau et al., 2015). However, machine learning models are often vulnerable to adversarial manipulation of their input intended to cause incorrect classification (Dalvi et al., 2004). In particular, neural networks and ma...
[...]
3,095 citations
Cites background or methods from "Neural Machine Translation by Joint..."
...To enable the controller to predict such connections, we use a set-selection type attention (Neelakantan et al., 2015) which was built upon the attention mechanism (Bahdanau et al., 2015; Vinyals et al., 2015)....
[...]
...…last few years have seen much success of deep neural networks in many challenging applications, such as speech recognition (Hinton et al., 2012), image recognition (LeCun et al., 1998; Krizhevsky et al., 2012) and machine translation (Sutskever et al., 2014; Bahdanau et al., 2015; Wu et al., 2016)....
[...]
...We use a parameter-server scheme where we have a parameter server of S shards, that store the shared parameters for K controller replicas....
[...]
2,951 citations
Cites background from "Neural Machine Translation by Joint..."
...The fixed width hidden vector forms a bottleneck for this information flow that we propose to circumvent using an attention mechanism inspired by recent results in translation and image recognition [6, 7]....
[...]
...These models draw on recent developments for incorporating attention mechanisms into recurrent neural network architectures [6, 7, 8, 4]....
[...]
2,938 citations
Cites background or methods from "Neural Machine Translation by Joint..."
...The pointer network (Vinyals et al., 2015) is a sequence-tosequence model that uses the soft attention distribution of Bahdanau et al. (2015) to produce an output sequence consisting of elements from the input sequence....
[...]
...The attention distribution at is calculated as in Bahdanau et al. (2015): eti = v T tanh(Whhi +Wsst +battn) (1) at = softmax(et) (2) where v, Wh, Ws and battn are learnable parameters....
[...]
...The attention distribution at is calculated as in Bahdanau et al. (2015):...
[...]
..., 2015) is a sequence-tosequence model that uses the soft attention distribution of Bahdanau et al. (2015) to produce an output sequence consisting of elements from...
[...]
References
72,897 citations
"Neural Machine Translation by Joint..." refers methods in this paper
...This gated unit is similar to a long short-term memory (LSTM) unit proposed earlier by Hochreiter and Schmidhuber (1997), sharing with it the ability to better model and learn long-term dependencies....
[...]
19,998 citations
7,309 citations
"Neural Machine Translation by Joint..." refers background or methods in this paper
...Since Bengio et al. (2003) introduced a neural probabilistic language model which uses a neural network to model the conditional probability of a word given a fixed number of the preceding words, neural networks have widely been used in machine translation. However, the role of neural networks has been largely limited to simply providing a single feature to an existing statistical machine translation system or to re-rank a list of candidate translations provided by an existing system. For instance, Schwenk (2012) proposed using a feedforward neural network to compute the score of a pair of source and target phrases and to use the score as an additional feature in the phrase-based statistical machine translation system. More recently, Kalchbrenner and Blunsom (2013) and Devlin et al. (2014) reported the successful use of the neural networks as a sub-component of the existing translation system....
[...]
...These paths allow gradients to flow backward easily without suffering too much from the vanishing effect (Hochreiter, 1991; Bengio et al., 1994; Pascanu et al., 2013a)....
[...]
...Since Bengio et al. (2003) introduced a neural probabilistic language model which uses a neural network to model the conditional probability of a word given a fixed number of the preceding words, neural networks have widely been used in machine translation....
[...]
...Since Bengio et al. (2003) introduced a neural probabilistic language model which uses a neural network to model the conditional probability of a word given a fixed number of the preceding words, neural networks have widely been used in machine translation. However, the role of neural networks has been largely limited to simply providing a single feature to an existing statistical machine translation system or to re-rank a list of candidate translations provided by an existing system. For instance, Schwenk (2012) proposed using a feedforward neural network to compute the score of a pair of source and target phrases and to use the score as an additional feature in the phrase-based statistical machine translation system....
[...]
7,290 citations
"Neural Machine Translation by Joint..." refers methods in this paper
...With this new approach the information can be spread throughout the sequence of annotations, which can be selectively retrieved by the decoder accordingly....
[...]
...Hence, we propose to use a bidirectional RNN (BiRNN, Schuster and Paliwal, 1997), which has been successfully used recently in speech recognition (see, e.g., Graves et al., 2013)....
[...]
6,832 citations