Open AccessProceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- Vol. 30, pp 5998-6008
Reads0
Chats0
TLDR
This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.Abstract:
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.read more
Citations
More filters
Posted Content
LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
TL;DR: New pretrained contextualized representations of words and entities based on the bidirectional transformer, and an entity-aware self-attention mechanism that considers the types of tokens (words or entities) when computing attention scores are proposed.
Proceedings ArticleDOI
Entangled Transformer for Image Captioning
TL;DR: A Transformer-based sequence modeling framework built only with attention layers and feedforward layers that enables the Transformer to exploit semantic and visual information simultaneously and achieves state-of-the-art performance on the MSCOCO image captioning dataset.
Proceedings ArticleDOI
MLPerf inference benchmark
Vijay Janapa Reddi,Christine Cheng,David Kanter,Peter Mattson,Guenther Schmuelling,Carole-Jean Wu,Brian M. Anderson,Maximilien Breughe,Mark Charlebois,William Chou,Ramesh Chukka,Cody Coleman,Sam Davis,Pan Deng,Greg Diamos,Jared Duke,Dave Fick,J. Scott Gardner,Itay Hubara,Sachin Satish Idgunji,Thomas B. Jablin,Jeff Jiao,Tom St. John,Pankaj Kanwar,David Lee,Jeffery Liao,Anton Lokhmotov,Francisco Massa,Peng Meng,Paulius Micikevicius,Colin Osborne,Gennady Pekhimenko,Arun Tejusve Raghunath Rajan,Dilip Sequeira,Ashish Sirasao,Fei Sun,Hanlin Tang,Michael Thomson,Frank Wei,Ephrem C. Wu,Lingjie Xu,Koichi Yamada,Bing Yu,George Yuan,Aaron Zhong,Peizhao Zhang,Yuchen Zhou +46 more
TL;DR: This paper presents the benchmarking method for evaluating ML inference systems, MLPerf Inference, and prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures.
Posted Content
DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
TL;DR: Inspired by recent advances in deep metric learning (DML), this work carefully design a self-supervised objective for learning universal sentence embeddings that does not require labelled training data and closes the performance gap between unsupervised and supervised pretraining for universal sentence encoders.
Posted Content
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation.
TL;DR: This paper factorizes 2D self-attention into two 1Dself-attentions, a novel building block that one could stack to form axial-att attention models for image classification and dense prediction, and achieves state-of-the-art results on Mapillary Vistas and Cityscapes.