Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Citations
29,480 citations
Cites methods from "Google's Neural Machine Translation..."
...The specifics are: • We use WordPiece embeddings (Wu et al., 2016) with a 30,000 token vocabulary....
[...]
10,913 citations
7,183 citations
7,019 citations
Cites background or methods from "Google's Neural Machine Translation..."
...For English-French, we used the significantly larger WMT 2014 English-French dataset consisting of 36M sentences and split tokens into a 32000 word-piece vocabulary [31]....
[...]
...This mimics the typical encoder-decoder attention mechanisms in sequence-to-sequence models such as [31, 2, 8]....
[...]
...In terms of computational complexity, self-attention layers are faster than recurrent layers when the sequence length n is smaller than the representation dimensionality d, which is most often the case with sentence representations used by state-of-the-art models in machine translations, such as word-piece [31] and byte-pair [25] representations....
[...]
...We set the maximum output length during inference to input length + 50, but terminate early when possible [31]....
[...]
...Numerous efforts have since continued to push the boundaries of recurrent language models and encoder-decoder architectures [31, 21, 13]....
[...]
6,953 citations
Cites background or methods from "Google's Neural Machine Translation..."
...6 [Wu et al., 2016] for the WMT translation and CNN/DM summarization tasks....
[...]
...Specifically, we use a beam width of 4 and a length penalty of α = 0.6 [Wu et al., 2016] for the WMT translation and CNN/DM summarization tasks....
[...]
References
123,388 citations
111,197 citations
"Google's Neural Machine Translation..." refers background in this paper
...Often the source side information is approximately left-to-right, similar to the target side, but depending on the language pair the information for a particular output word can be distributed and even be split up in certain regions of the input side....
[...]
72,897 citations
"Google's Neural Machine Translation..." refers result in this paper
...Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google’s phrase-based production system....
[...]
19,998 citations
14,077 citations