Attention is All you Need

Open AccessProceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- Vol. 30, pp 5998-6008

Chats0

TLDR

This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

Abstract:

The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Self-attention for raw optical Satellite Time Series Classification

Marc Rußwurm, +1 more

- 01 Nov 2020 -

Isprs Journal of Photogrammetry and Remo...

TL;DR: This work compares recent deep learning models on crop type classification on raw and preprocessed Sentinel 2 data and qualitatively shows how self-attention scores focus selectively on few classification-relevant observations.

...read moreread less

Journal ArticleDOI

Artificial Neural Networks for Neuroscientists: A Primer.

Guangyu Robert Yang, +1 more

- 23 Sep 2020 -

Neuron

TL;DR: This pedagogical Primer introduces artificial neural networks and demonstrates how they have been fruitfully deployed to study neuroscientific questions, and details how to customize the analysis, structure, and learning of ANNs to better address a wide range of challenges in brain research.

...read moreread less

Proceedings ArticleDOI

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

Feng Li, +5 more

TL;DR: A novel denoising training method to speedup DETR (DEtection TRansformer) training and offer a deepened understanding of the slow convergence issue of DETR-like methods with ResNet-50 backbone.

...read moreread less

Posted Content

Understanding Knowledge Distillation in Non-autoregressive Machine Translation

Chunting Zhou, +2 more

- 07 Nov 2019 -

arXiv: Computation and Language

TL;DR: It is found that knowledge distillation can reduce the complexity of data sets and help NAT to model the variations in the output data, and a strong correlation is observed between the capacity of an NAT model and the optimal complexity of the distilled data for the best translation quality.

...read moreread less

Posted ContentDOI

A data-driven drug repositioning framework discovered a potential therapeutic agent targeting COVID-19

Yiyue Ge, +28 more

- 12 Mar 2020 -

bioRxiv

TL;DR: The in silico screening followed by wet-lab validation indicated that a poly-ADP-ribose polymerase 1 (PARP1) inhibitor, CVL218, currently in Phase I clinical trial, may be repurposed to treat COVID-19 and proposed several possible mechanisms to explain the antiviral activities of PARP1 inhibitors against SARS-CoV-2.

...read moreread less

Collapse

Attention is All you Need

Citations

Self-attention for raw optical Satellite Time Series Classification

Artificial Neural Networks for Neuroscientists: A Primer.

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

Understanding Knowledge Distillation in Non-autoregressive Machine Translation

A data-driven drug repositioning framework discovered a potential therapeutic agent targeting COVID-19

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Adam: A Method for Stochastic Optimization

Deep Residual Learning for Image Recognition

Long short-term memory

Bleu: a Method for Automatic Evaluation of Machine Translation