Open AccessProceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- Vol. 30, pp 5998-6008
Reads0
Chats0
TLDR
This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.Abstract:
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.read more
Citations
More filters
Posted Content
Calibrating Deep Neural Networks using Focal Loss
Jishnu Mukhoti,Viveka Kulharia,Amartya Sanyal,Stuart Golodetz,Philip H. S. Torr,Puneet K. Dokania +5 more
TL;DR: This work provides a thorough analysis of the factors causing miscalibration of Deep Neural Networks, and provides a principled approach to automatically select the hyperparameter involved in the loss function.
Proceedings ArticleDOI
STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework
Mingbo Ma,Liang Huang,Xiong Hao,Renjie Zheng,Kaibo Liu,Baigong Zheng,Zhang Chuanqiang,Zhongjun He,Hairong Liu,Xing Li,Hua Wu,Haifeng Wang +11 more
TL;DR: This paper propose a prefix-to-prefix framework for multaneous translation that implicitly learns to anticipate in a single translation model, which achieves low latency and reasonable qual- ity (compared to full-sentence translation) on 4 directions.
Posted Content
MLPerf Training Benchmark.
Peter Mattson,Christine Cheng,Cody Coleman,Greg Diamos,Paulius Micikevicius,David A. Patterson,Hanlin Tang,Gu-Yeon Wei,Peter Bailis,Victor Bittorf,David Brooks,Dehao Chen,Debojyoti Dutta,Udit Gupta,Kim Hazelwood,Andrew Hock,Xinyuan Huang,Atsushi Ike,Bill Jia,Daniel Kang,David Kanter,Naveen Kumar,Jeffery Liao,Guokai Ma,Deepak Narayanan,Tayo Oguntebi,Gennady Pekhimenko,Lillian Pentecost,Vijay Janapa Reddi,Taylor Robie,Tom St. John,Tsuguchika Tabaru,Carole-Jean Wu,Lingjie Xu,Yamazaki Masafumi,Cliff Young,Matei Zaharia +36 more
TL;DR: MLPerf as discussed by the authors is an ML benchmark that overcomes three unique benchmarking challenges absent from other domains: optimizations that improve training throughput can increase the time to solution, training is stochastic and time-to-solution exhibits high variance.
Posted Content
Benchmarking TPU, GPU, and CPU Platforms for Deep Learning
TL;DR: ParaDnn is introduced, a parameterized benchmark suite for deep learning that generates end-to-end models for fully connected, convolutional (CNN), and recurrent (RNN) neural networks, and the rapid performance improvements that specialized software stacks provide for the TPU and GPU platforms are quantified.
Posted Content
Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things
Jing Zhang,Dacheng Tao +1 more
TL;DR: It is shown how AI can empower the IoT to make it faster, smarter, greener, and safer, and some promising applications of AIoT that are likely to profoundly reshape the authors' world are summarized.