Open AccessProceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- Vol. 30, pp 5998-6008
Reads0
Chats0
TLDR
This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.Abstract:
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.read more
Citations
More filters
Posted Content
ACFNet: Attentional Class Feature Network for Semantic Segmentation
TL;DR: This paper presents the concept of class center which extracts the global context from a categorical perspective, and proposes a novel module, named Attentional Class Feature (ACF) module, to calculate and adaptively combine different class centers according to each pixel.
Journal ArticleDOI
Machine Learning for industrial applications: A comprehensive literature review
TL;DR: This paper deals with industrial applications of ML techniques, intending to clarify the real potentialities, as well as potential flaws, of ML algorithms applied to operation management, and a comprehensive review is presented and organized in a way that should facilitate the orientation of practitioners in this field.
Proceedings ArticleDOI
Learning from Dialogue after Deployment: Feed Yourself, Chatbot!
TL;DR: On the PersonaChat chit-chat dataset with over 131k training examples, it is found that learning from dialogue with a self-feeding chatbot significantly improves performance, regardless of the amount of traditional supervision.
Book ChapterDOI
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
TL;DR: The proposed XML model uses a late fusion design with a novel Convolutional Start-End detector (ConvSE), surpassing baselines by a large margin and with better efficiency, providing a strong starting point for future work.
Proceedings ArticleDOI
N-ary Relation Extraction using Graph-State LSTM
TL;DR: This work proposes a graph-state LSTM model, which uses a parallel state to model each word, recurrently enriching state values via message passing, and speeds up computation by allowing more parallelization.