scispace - formally typeset
Open AccessProceedings Article

Attention is All you Need

Reads0
Chats0
TLDR
This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Abstract
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

TL;DR: This article proposed Glow-TTS, a flow-based generative model for parallel text-to-speech (TTS) that does not require any external aligner and achieves an order-of-magnitude speedup over the autoregressive model, Tacotron 2, at synthesis with comparable speech quality.
Proceedings ArticleDOI

Deep Learning for Depression Detection of Twitter Users

TL;DR: The most effective deep neural network architecture among a few of selected architectures that were successfully used in natural language processing tasks are identified and used to detect users with signs of mental illnesses given limited unstructured text data extracted from the Twitter social media platform.
Posted Content

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

TL;DR: A combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset.
Proceedings ArticleDOI

Music Gesture for Visual Sound Separation

TL;DR: This work proposes ``Music Gesture," a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music, which adopts a context-aware graph network to integrate visual semantic context with body dynamics and applies an audio-visual fusion model to associate body movements with the corresponding audio signals.
Proceedings ArticleDOI

Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer

TL;DR: Wang et al. as mentioned in this paper proposed a cross-modality shared-specific feature transfer algorithm (termed cm-SSFT) to explore the potential of both the modality-shared information and the modal-specific characteristics to boost the reID performance.