Open AccessProceedings Article
Tensor2Tensor for Neural Machine Translation
Ashish Vaswani,Samy Bengio,Eugene Brevdo,François Chollet,Aidan N. Gomez,Stephan Gouws,Llion Jones,Łukasz Kaiser,Nal Kalchbrenner,Niki Parmar,Ryan Sepassi,Noam Shazeer,Jakob Uszkoreit +12 more
- Vol. 1, pp 193-199
TLDR
Tensor2Tensor as mentioned in this paper is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.Abstract:
Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.read more
Citations
More filters
Posted Content
fairseq: A Fast, Extensible Toolkit for Sequence Modeling.
Myle Ott,Sergey Edunov,Alexei Baevski,Angela Fan,Sam Gross,Nathan Ng,David Grangier,Michael Auli +7 more
TL;DR: fairseq as discussed by the authors is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks, and supports distributed training across multiple GPUs and machines.
Proceedings ArticleDOI
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott,Sergey Edunov,Alexei Baevski,Angela Fan,Sam Gross,Nathan Ng,David Grangier,Michael Auli +7 more
TL;DR: Fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks and supports distributed training across multiple GPUs and machines.
Posted Content
BERTScore: Evaluating Text Generation with BERT
TL;DR: This work proposes BERTScore, an automatic evaluation metric for text generation that correlates better with human judgments and provides stronger model selection performance than existing metrics.
Posted Content
Towards a Human-like Open-Domain Chatbot
Daniel Adiwardana,Minh-Thang Luong,David R. So,Jamie Hall,Noah Fiedel,Romal Thoppilan,Zi Yang,Apoorv Kulshreshtha,Gaurav Nemade,Yifeng Lu,Quoc V. Le +10 more
TL;DR: Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations, is presented and a human evaluation metric called Sensibleness and Specificity Average (SSA) is proposed, which captures key elements of a human-like multi- turn conversation.
Proceedings ArticleDOI
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita,Xiaofei Wang,Shinji Watanabe,Takenori Yoshimura,Wangyou Zhang,Nanxin Chen,Tomoki Hayashi,Takaaki Hori,Hirofumi Inaguma,Ziyan Jiang,Masao Someki,Nelson Yalta,Ryuichi Yamamoto +12 more
TL;DR: Transformer as mentioned in this paper is an emergent sequence-to-sequence model which achieves state-of-the-art performance in neural machine translation and other natural language processing applications, such as automatic speech recognition (ASR), speech translation (ST), and text to speech (TTS).
References
More filters
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Proceedings Article
Sequence to Sequence Learning with Neural Networks
TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Posted Content
Attention Is All You Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Proceedings Article
Convolutional Sequence to Sequence Learning
TL;DR: The authors introduced an architecture based entirely on convolutional neural networks, where computations over all elements can be fully parallelized during training and optimization is easier since the number of nonlinearities is fixed and independent of the input length.