Tensor2Tensor for Neural Machine Translation

Open AccessProceedings Article

Tensor2Tensor for Neural Machine Translation

- Vol. 1, pp 193-199

TLDR

Tensor2Tensor as mentioned in this paper is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.

Abstract:

Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.

Citations

PDF

Open Access

More filters

Posted Content

fairseq: A Fast, Extensible Toolkit for Sequence Modeling.

Myle Ott, +7 more

- 01 Apr 2019 -

arXiv: Computation and Language

TL;DR: fairseq as discussed by the authors is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks, and supports distributed training across multiple GPUs and machines.

...read moreread less

Proceedings ArticleDOI

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

Myle Ott, +7 more

TL;DR: Fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks and supports distributed training across multiple GPUs and machines.

...read moreread less

Posted Content

BERTScore: Evaluating Text Generation with BERT

Tianyi Zhang, +4 more

- 21 Apr 2019 -

arXiv: Computation and Language

TL;DR: This work proposes BERTScore, an automatic evaluation metric for text generation that correlates better with human judgments and provides stronger model selection performance than existing metrics.

...read moreread less

Posted Content

Towards a Human-like Open-Domain Chatbot

Daniel Adiwardana, +10 more

- 27 Jan 2020 -

arXiv: Computation and Language

TL;DR: Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations, is presented and a human evaluation metric called Sensibleness and Specificity Average (SSA) is proposed, which captures key elements of a human-like multi- turn conversation.

...read moreread less

Proceedings ArticleDOI

A Comparative Study on Transformer vs RNN in Speech Applications

Shigeki Karita, +12 more

TL;DR: Transformer as mentioned in this paper is an emergent sequence-to-sequence model which achieves state-of-the-art performance in neural machine translation and other natural language processing applications, such as automatic speech recognition (ASR), speech translation (ST), and text to speech (TTS).

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Proceedings Article

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

...read moreread less

Posted Content

Attention Is All You Need

Ashish Vaswani, +7 more

- 12 Jun 2017 -

arXiv: Computation and Language

TL;DR: A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

...read moreread less

Proceedings Article

Convolutional Sequence to Sequence Learning

Jonas Gehring, +4 more

TL;DR: The authors introduced an architecture based entirely on convolutional neural networks, where computations over all elements can be fully parallelized during training and optimization is easier since the number of nonlinearities is fixed and independent of the input length.

...read moreread less

Tensor2Tensor for Neural Machine Translation

Citations

fairseq: A Fast, Extensible Toolkit for Sequence Modeling.

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

BERTScore: Evaluating Text Generation with BERT

Towards a Human-like Open-Domain Chatbot

A Comparative Study on Transformer vs RNN in Speech Applications

References

Long short-term memory

Neural Machine Translation by Jointly Learning to Align and Translate

Sequence to Sequence Learning with Neural Networks

Attention Is All You Need

Convolutional Sequence to Sequence Learning

Related Papers (5)

Attention is All you Need

Neural Machine Translation by Jointly Learning to Align and Translate

Bleu: a Method for Automatic Evaluation of Machine Translation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Deep Residual Learning for Image Recognition