Multilingual End-to-End Speech Translation
Hirofumi Inaguma,Kevin Duh,Tatsuya Kawahara,Shinji Watanabe +3 more
- pp 570-577
TLDR
In this paper, the authors proposed a multilingual end-to-end speech translation (ST) model, in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-tosequence architecture.Abstract:
In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the first time they are applied to the end-to-end ST problem. We show the effectiveness of multilingual end-to-end ST in two scenarios: one-to-many and many-to-many translations with publicly available data. We experimentally confirm that multilingual end-to-end ST models significantly outperform bilingual ones in both scenarios. The generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair. All of our codes and the database are publicly available to encourage further research in this emergent multilingual ST topic11Available at https://github.com/espnet/espnet..read more
Citations
More filters
Proceedings ArticleDOI
ESPnet-ST: All-in-One Speech Translation Toolkit
Hirofumi Inaguma,Shun Kiyono,Kevin Duh,Shigeki Karita,Nelson Yalta,Tomoki Hayashi,Shinji Watanabe +6 more
TL;DR: ESnet-ST is a new project inside end-to-end speech processing toolkit, ESPnet, which integrates or newly implements automatic speech recognition, machine translation, and text-to -speech functions for speech translation.
Proceedings ArticleDOI
Speech Translation and the End-to-End Promise: Taking Stock of Where We Are.
Matthias Sperber,Matthias Paulik +1 more
TL;DR: A unifying categorization and nomenclature that covers both traditional and recent approaches and that may help researchers by highlighting both trade-offs and open research questions is provided.
Posted Content
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
Changhan Wang,Anne Wu,Juan Pino +2 more
TL;DR: CoVoST 2 is released, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages, which represents the largest open dataset available to date from total volume and language coverage perspective.
Posted Content
Curriculum Pre-training for End-to-End Speech Translation
TL;DR: This work proposes a curriculum pre-training method that includes an elementary course for transcription learning and two advanced courses for understanding the utterance and mapping words in two languages and shows that this method leads to significant improvements on En-De and En-Fr speech translation benchmarks.
Proceedings Article
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
TL;DR: This work introduces CoVoST, a multilingual speech-to-text translation corpus from 11 languages into English, diversified with over 11,000 speakers and over 60 accents, and describes the dataset creation methodology and provides empirical evidence of the quality of the data.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI
Bleu: a Method for Automatic Evaluation of Machine Translation
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.