scispace - formally typeset
Open AccessProceedings Article

Tensor2Tensor for Neural Machine Translation

TLDR
Tensor2Tensor as mentioned in this paper is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.
Abstract
Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.

read more

Citations
More filters
Posted Content

fairseq: A Fast, Extensible Toolkit for Sequence Modeling.

TL;DR: fairseq as discussed by the authors is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks, and supports distributed training across multiple GPUs and machines.
Proceedings ArticleDOI

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

TL;DR: Fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks and supports distributed training across multiple GPUs and machines.
Posted Content

BERTScore: Evaluating Text Generation with BERT

TL;DR: This work proposes BERTScore, an automatic evaluation metric for text generation that correlates better with human judgments and provides stronger model selection performance than existing metrics.
Posted Content

Towards a Human-like Open-Domain Chatbot

TL;DR: Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations, is presented and a human evaluation metric called Sensibleness and Specificity Average (SSA) is proposed, which captures key elements of a human-like multi- turn conversation.
Proceedings ArticleDOI

A Comparative Study on Transformer vs RNN in Speech Applications

TL;DR: Transformer as mentioned in this paper is an emergent sequence-to-sequence model which achieves state-of-the-art performance in neural machine translation and other natural language processing applications, such as automatic speech recognition (ASR), speech translation (ST), and text to speech (TTS).
References
More filters
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Proceedings Article

Sequence to Sequence Learning with Neural Networks

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Posted Content

Attention Is All You Need

TL;DR: A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Proceedings Article

Convolutional Sequence to Sequence Learning

TL;DR: The authors introduced an architecture based entirely on convolutional neural networks, where computations over all elements can be fully parallelized during training and optimization is easier since the number of nonlinearities is fixed and independent of the input length.
Related Papers (5)