scispace - formally typeset
Open AccessProceedings ArticleDOI

Multilingual End-to-End Speech Translation

TLDR
In this paper, the authors proposed a multilingual end-to-end speech translation (ST) model, in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-tosequence architecture.
Abstract
In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the first time they are applied to the end-to-end ST problem. We show the effectiveness of multilingual end-to-end ST in two scenarios: one-to-many and many-to-many translations with publicly available data. We experimentally confirm that multilingual end-to-end ST models significantly outperform bilingual ones in both scenarios. The generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair. All of our codes and the database are publicly available to encourage further research in this emergent multilingual ST topic11Available at https://github.com/espnet/espnet..

read more

Citations
More filters
Proceedings ArticleDOI

ESPnet-ST: All-in-One Speech Translation Toolkit

TL;DR: ESnet-ST is a new project inside end-to-end speech processing toolkit, ESPnet, which integrates or newly implements automatic speech recognition, machine translation, and text-to -speech functions for speech translation.
Proceedings ArticleDOI

Speech Translation and the End-to-End Promise: Taking Stock of Where We Are.

TL;DR: A unifying categorization and nomenclature that covers both traditional and recent approaches and that may help researchers by highlighting both trade-offs and open research questions is provided.
Posted Content

CoVoST 2 and Massively Multilingual Speech-to-Text Translation

TL;DR: CoVoST 2 is released, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages, which represents the largest open dataset available to date from total volume and language coverage perspective.
Posted Content

Curriculum Pre-training for End-to-End Speech Translation

TL;DR: This work proposes a curriculum pre-training method that includes an elementary course for transcription learning and two advanced courses for understanding the utterance and mapping words in two languages and shows that this method leads to significant improvements on En-De and En-Fr speech translation benchmarks.
Proceedings Article

CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

TL;DR: This work introduces CoVoST, a multilingual speech-to-text translation corpus from 11 languages into English, diversified with over 11,000 speakers and over 60 accents, and describes the dataset creation methodology and provides empirical evidence of the quality of the data.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Related Papers (5)