Attention is All you Need

Open AccessProceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- Vol. 30, pp 5998-6008

Chats0

TLDR

This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

Abstract:

The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.

Citations

PDF

Open Access

More filters

Posted Content

Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition

Kai Wang, +4 more

- 10 May 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A novel Region Attention Network (RAN), to adaptively capture the importance of facial regions for occlusion and pose variant FER, and a region biased loss to encourage high attention weights for the most important regions.

...read moreread less

Proceedings ArticleDOI

A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss

Wan-Ting Hsu, +5 more

TL;DR: This paper proposed a unified model combining the strength of extractive and abstractive summarization, which achieved state-of-the-art ROUGE scores while being the most informative and readable summarization on the CNN/Daily Mail dataset.

...read moreread less

Journal ArticleDOI

Accurate Screening of COVID-19 Using Attention-Based Deep 3D Multiple Instance Learning

Zhongyi Han, +7 more

- 21 May 2020 -

IEEE Transactions on Medical Imaging

TL;DR: This paper proposes an attention-based deep 3D multiple instance learning (AD3D-MIL) where a patient-level label is assigned to a 3D chest CT that is viewed as a bag of instances, which can semantically generate deep3D instances following the possible infection area.

...read moreread less

Proceedings ArticleDOI

Object-Driven Text-To-Image Synthesis via Adversarial Training

Wenbo Li, +6 more

TL;DR: A thorough comparison between the classic grid attention and the new object-driven attention is provided through analyzing their mechanisms and visualizing their attention layers, showing insights of how the proposed model generates complex scenes in high quality.

...read moreread less

Posted Content

PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling

Xu Yan, +4 more

- 01 Mar 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A novel end-to-end network for robust point clouds processing, named PointASNL, which achieves state-of-the-art robust performance for classification and segmentation tasks on all datasets, and significantly outperforms previous methods on real-world outdoor SemanticKITTI dataset with considerate noise.

...read moreread less

Collapse

Attention is All you Need

Citations

Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition

A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss

Accurate Screening of COVID-19 Using Attention-Based Deep 3D Multiple Instance Learning

Object-Driven Text-To-Image Synthesis via Adversarial Training

PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Adam: A Method for Stochastic Optimization

Deep Residual Learning for Image Recognition

Long short-term memory

Bleu: a Method for Automatic Evaluation of Machine Translation