Multilingual End-to-End Speech Translation

doi:10.1109/ASRU46091.2019.9003832

Open AccessProceedings ArticleDOI

Multilingual End-to-End Speech Translation

- pp 570-577

TLDR

In this paper, the authors proposed a multilingual end-to-end speech translation (ST) model, in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-tosequence architecture.

Abstract:

In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the first time they are applied to the end-to-end ST problem. We show the effectiveness of multilingual end-to-end ST in two scenarios: one-to-many and many-to-many translations with publicly available data. We experimentally confirm that multilingual end-to-end ST models significantly outperform bilingual ones in both scenarios. The generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair. All of our codes and the database are publicly available to encourage further research in this emergent multilingual ST topic11Available at https://github.com/espnet/espnet..

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

ESPnet-ST: All-in-One Speech Translation Toolkit

Hirofumi Inaguma, +6 more

TL;DR: ESnet-ST is a new project inside end-to-end speech processing toolkit, ESPnet, which integrates or newly implements automatic speech recognition, machine translation, and text-to -speech functions for speech translation.

...read moreread less

Proceedings ArticleDOI

Speech Translation and the End-to-End Promise: Taking Stock of Where We Are.

Matthias Sperber, +1 more

TL;DR: A unifying categorization and nomenclature that covers both traditional and recent approaches and that may help researchers by highlighting both trade-offs and open research questions is provided.

...read moreread less

Posted Content

CoVoST 2 and Massively Multilingual Speech-to-Text Translation

Changhan Wang, +2 more

- 20 Jul 2020 -

arXiv: Computation and Language

TL;DR: CoVoST 2 is released, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages, which represents the largest open dataset available to date from total volume and language coverage perspective.

...read moreread less

Posted Content

Curriculum Pre-training for End-to-End Speech Translation

Chengyi Wang, +4 more

- 21 Apr 2020 -

arXiv: Computation and Language

TL;DR: This work proposes a curriculum pre-training method that includes an elementary course for transcription learning and two advanced courses for understanding the utterance and mapping words in two languages and shows that this method leads to significant improvements on En-De and En-Fr speech translation benchmarks.

...read moreread less

Proceedings Article

CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

Changhan Wang, +3 more

TL;DR: This work introduces CoVoST, a multilingual speech-to-text translation corpus from 11 languages into English, diversified with over 11,000 speakers and over 60 accents, and describes the dataset creation methodology and provides empirical evidence of the quality of the data.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

Collapse

Multilingual End-to-End Speech Translation

Citations

ESPnet-ST: All-in-One Speech Translation Toolkit

Speech Translation and the End-to-End Promise: Taking Stock of Where We Are.

CoVoST 2 and Massively Multilingual Speech-to-Text Translation

Curriculum Pre-training for End-to-End Speech Translation

CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

References

Adam: A Method for Stochastic Optimization

Long short-term memory

Attention is All you Need

Very Deep Convolutional Networks for Large-Scale Image Recognition

Bleu: a Method for Automatic Evaluation of Machine Translation

Related Papers (5)

Attention is All you Need

Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

MuST-C: a Multilingual Speech Translation Corpus

Bleu: a Method for Automatic Evaluation of Machine Translation