Open AccessProceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- Vol. 30, pp 5998-6008
Reads0
Chats0
TLDR
This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.Abstract:
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.read more
Citations
More filters
Journal ArticleDOI
Self-attention for raw optical Satellite Time Series Classification
Marc Rußwurm,Marco Körner +1 more
TL;DR: This work compares recent deep learning models on crop type classification on raw and preprocessed Sentinel 2 data and qualitatively shows how self-attention scores focus selectively on few classification-relevant observations.
Journal ArticleDOI
Artificial Neural Networks for Neuroscientists: A Primer.
TL;DR: This pedagogical Primer introduces artificial neural networks and demonstrates how they have been fruitfully deployed to study neuroscientific questions, and details how to customize the analysis, structure, and learning of ANNs to better address a wide range of challenges in brain research.
Proceedings ArticleDOI
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
TL;DR: A novel denoising training method to speedup DETR (DEtection TRansformer) training and offer a deepened understanding of the slow convergence issue of DETR-like methods with ResNet-50 backbone.
Posted Content
Understanding Knowledge Distillation in Non-autoregressive Machine Translation
TL;DR: It is found that knowledge distillation can reduce the complexity of data sets and help NAT to model the variations in the output data, and a strong correlation is observed between the capacity of an NAT model and the optimal complexity of the distilled data for the best translation quality.
Posted ContentDOI
A data-driven drug repositioning framework discovered a potential therapeutic agent targeting COVID-19
Yiyue Ge,Tingzhong Tian,Huang S,Fangping Wan,J Li,Shuya Li,Hui Yang,Lixiang Hong,Nian Wu,Enming Yuan,Lili Cheng,Yipin Lei,Hantao Shu,Xiaolong Feng,Ziyuan Jiang,Ying Chi,Xiling Guo,Lunbiao Cui,Liang Xiao,Zeng Li,Chunhao Yang,Zehong Miao,Haidong Tang,Ligong Chen,Hainian Zeng,Dan Zhao,Fengcai Zhu,Xiaokun Shen,Jianyang Zeng +28 more
TL;DR: The in silico screening followed by wet-lab validation indicated that a poly-ADP-ribose polymerase 1 (PARP1) inhibitor, CVL218, currently in Phase I clinical trial, may be repurposed to treat COVID-19 and proposed several possible mechanisms to explain the antiviral activities of PARP1 inhibitors against SARS-CoV-2.