Open AccessProceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- Vol. 30, pp 5998-6008
Reads0
Chats0
TLDR
This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.Abstract:
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.read more
Citations
More filters
Posted Content
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
TL;DR: This article proposed Glow-TTS, a flow-based generative model for parallel text-to-speech (TTS) that does not require any external aligner and achieves an order-of-magnitude speedup over the autoregressive model, Tacotron 2, at synthesis with comparable speech quality.
Proceedings ArticleDOI
Deep Learning for Depression Detection of Twitter Users
TL;DR: The most effective deep neural network architecture among a few of selected architectures that were successfully used in natural language processing tasks are identified and used to detect users with signs of mental illnesses given limited unstructured text data extracted from the Twitter social media platform.
Posted Content
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Yu Zhang,James Qin,Daniel S. Park,Wei Han,Chung-Cheng Chiu,Ruoming Pang,Quoc V. Le,Yonghui Wu +7 more
TL;DR: A combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset.
Proceedings ArticleDOI
Music Gesture for Visual Sound Separation
TL;DR: This work proposes ``Music Gesture," a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music, which adopts a context-aware graph network to integrate visual semantic context with body dynamics and applies an audio-visual fusion model to associate body movements with the corresponding audio signals.
Proceedings ArticleDOI
Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer
TL;DR: Wang et al. as mentioned in this paper proposed a cross-modality shared-specific feature transfer algorithm (termed cm-SSFT) to explore the potential of both the modality-shared information and the modal-specific characteristics to boost the reID performance.