scispace - formally typeset
Open AccessProceedings Article

Attention is All you Need

Reads0
Chats0
TLDR
This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Abstract
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

ACFNet: Attentional Class Feature Network for Semantic Segmentation

TL;DR: This paper presents the concept of class center which extracts the global context from a categorical perspective, and proposes a novel module, named Attentional Class Feature (ACF) module, to calculate and adaptively combine different class centers according to each pixel.
Journal ArticleDOI

Machine Learning for industrial applications: A comprehensive literature review

TL;DR: This paper deals with industrial applications of ML techniques, intending to clarify the real potentialities, as well as potential flaws, of ML algorithms applied to operation management, and a comprehensive review is presented and organized in a way that should facilitate the orientation of practitioners in this field.
Proceedings ArticleDOI

Learning from Dialogue after Deployment: Feed Yourself, Chatbot!

TL;DR: On the PersonaChat chit-chat dataset with over 131k training examples, it is found that learning from dialogue with a self-feeding chatbot significantly improves performance, regardless of the amount of traditional supervision.
Book ChapterDOI

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

TL;DR: The proposed XML model uses a late fusion design with a novel Convolutional Start-End detector (ConvSE), surpassing baselines by a large margin and with better efficiency, providing a strong starting point for future work.
Proceedings ArticleDOI

N-ary Relation Extraction using Graph-State LSTM

TL;DR: This work proposes a graph-state LSTM model, which uses a parallel state to model each word, recurrently enriching state values via message passing, and speeds up computation by allowing more parallelization.