Attention is All you Need

Open AccessProceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- Vol. 30, pp 5998-6008

Chats0

TLDR

This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

Abstract:

The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.

Citations

PDF

Open Access

More filters

Posted Content

ACFNet: Attentional Class Feature Network for Semantic Segmentation

Fan Zhang, +7 more

- 20 Sep 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper presents the concept of class center which extracts the global context from a categorical perspective, and proposes a novel module, named Attentional Class Feature (ACF) module, to calculate and adaptively combine different class centers according to each pixel.

...read moreread less

Journal ArticleDOI

Machine Learning for industrial applications: A comprehensive literature review

Massimo Bertolini, +3 more

- 01 Aug 2021 -

Expert Systems With Applications

TL;DR: This paper deals with industrial applications of ML techniques, intending to clarify the real potentialities, as well as potential flaws, of ML algorithms applied to operation management, and a comprehensive review is presented and organized in a way that should facilitate the orientation of practitioners in this field.

...read moreread less

Proceedings ArticleDOI

Learning from Dialogue after Deployment: Feed Yourself, Chatbot!

Braden Hancock, +3 more

TL;DR: On the PersonaChat chit-chat dataset with over 131k training examples, it is found that learning from dialogue with a self-feeding chatbot significantly improves performance, regardless of the amount of traditional supervision.

...read moreread less

Book ChapterDOI

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

Jie Lei, +3 more

TL;DR: The proposed XML model uses a late fusion design with a novel Convolutional Start-End detector (ConvSE), surpassing baselines by a large margin and with better efficiency, providing a strong starting point for future work.

...read moreread less

Proceedings ArticleDOI

N-ary Relation Extraction using Graph-State LSTM

Linfeng Song, +3 more

TL;DR: This work proposes a graph-state LSTM model, which uses a parallel state to model each word, recurrently enriching state values via message passing, and speeds up computation by allowing more parallelization.

...read moreread less

Collapse

Attention is All you Need

Citations

ACFNet: Attentional Class Feature Network for Semantic Segmentation

Machine Learning for industrial applications: A comprehensive literature review

Learning from Dialogue after Deployment: Feed Yourself, Chatbot!

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

N-ary Relation Extraction using Graph-State LSTM

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Adam: A Method for Stochastic Optimization

Deep Residual Learning for Image Recognition

Long short-term memory

Bleu: a Method for Automatic Evaluation of Machine Translation