Open AccessProceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- Vol. 30, pp 5998-6008
Reads0
Chats0
TLDR
This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.Abstract:
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.read more
Citations
More filters
Proceedings ArticleDOI
From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding
Parisa Haghani,Arun Narayanan,Michiel Bacchiani,Galen Chuang,Neeraj Gaur,Pedro J. Moreno,Rohit Prabhavalkar,Zhongdi Qu,Austin Waters +8 more
TL;DR: This paper formulate audio to semantic understanding as a sequence-to-sequence problem, and proposes and compares various encoder-decoder based approaches that optimize both modules jointly, in an end- to-end manner.
Journal ArticleDOI
Deep Entity Matching with Pre-Trained Language Models
TL;DR: This paper proposed Ditto, a novel entity matching system based on pre-trained Transformer-based language models, which fine-tuned and cast EM as a sequence-pair classification problem to leverage such models with a simple architecture.
Proceedings ArticleDOI
Multi-hop Selector Network for Multi-turn Response Selection in Retrieval-based Chatbots.
TL;DR: The side effect of using too many context utterances is analyzed and a multi-hop selector network (MSN) is proposed to alleviate the problem and results show that MSN outperforms some state-of-the-art methods on three public multi-turn dialogue datasets.
Proceedings ArticleDOI
MaskGIT: Masked Generative Image Transformer
TL;DR: The proposed MaskGIT is a novel image synthesis paradigm using a bidirectional transformer decoder that significantly outperforms the state-of-the-art transformer model on the ImageNet dataset, and accelerates autoregressive decoding by up to 48x.
Posted Content
Support-set bottlenecks for video-text representation learning
Mandela Patrick,Po-Yao Huang,Yuki M. Asano,Florian Metze,Alexander G. Hauptmann,João F. Henriques,Andrea Vedaldi +6 more
TL;DR: This paper proposes a novel method that leverages a generative model to naturally push related samples together, and results in representations that explicitly encode semantics shared between samples, unlike noise contrastive learning.