scispace - formally typeset
Open AccessProceedings Article

Attention is All you Need

Reads0
Chats0
TLDR
This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Abstract
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition

TL;DR: A novel Region Attention Network (RAN), to adaptively capture the importance of facial regions for occlusion and pose variant FER, and a region biased loss to encourage high attention weights for the most important regions.
Proceedings ArticleDOI

A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss

TL;DR: This paper proposed a unified model combining the strength of extractive and abstractive summarization, which achieved state-of-the-art ROUGE scores while being the most informative and readable summarization on the CNN/Daily Mail dataset.
Journal ArticleDOI

Accurate Screening of COVID-19 Using Attention-Based Deep 3D Multiple Instance Learning

TL;DR: This paper proposes an attention-based deep 3D multiple instance learning (AD3D-MIL) where a patient-level label is assigned to a 3D chest CT that is viewed as a bag of instances, which can semantically generate deep3D instances following the possible infection area.
Proceedings ArticleDOI

Object-Driven Text-To-Image Synthesis via Adversarial Training

TL;DR: A thorough comparison between the classic grid attention and the new object-driven attention is provided through analyzing their mechanisms and visualizing their attention layers, showing insights of how the proposed model generates complex scenes in high quality.
Posted Content

PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling

TL;DR: A novel end-to-end network for robust point clouds processing, named PointASNL, which achieves state-of-the-art robust performance for classification and segmentation tasks on all datasets, and significantly outperforms previous methods on real-world outdoor SemanticKITTI dataset with considerate noise.