Sequence-Based Multi-Lingual Low Resource Speech Recognition

doi:10.1109/ICASSP.2018.8461802

Open AccessProceedings ArticleDOI

Sequence-Based Multi-Lingual Low Resource Speech Recognition

Siddharth Dalmia, +3 more

- pp 4909-4913

Chats0

TLDR

The authors showed that end-to-end multi-lingual training of sequence models is effective on context independent models trained using Connectionist Temporal Classification (CTC) loss and showed that the trained model can be adapted cross-lingually to an unseen language using just 25% of the target data.

Abstract:

Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are attractive because of their simplicity and elegance. While it is possible to integrate traditional multi-lingual bottleneck feature extractors as front-ends, we show that end-to-end multi-lingual training of sequence models is effective on context independent models trained using Connectionist Temporal Classification (CTC) loss. We show that our model improves performance on Babel languages by over 6% absolute in terms of word/phoneme error rate when compared to mono-lingual systems built in the same setting for these languages. We also show that the trained model can be adapted cross-lingually to an unseen language using just 25% of the target data. We show that training on multiple languages is important for very low resource cross-lingual target scenarios, but not for multi-lingual testing scenarios. Here, it appears beneficial to include large well prepared datasets.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Unsupervised Pretraining Transfers Well Across Languages

Morgane Riviere, +3 more

TL;DR: In this article, contrastive predictive coding (CPC) algorithms have been proposed to pretrain ASR systems with unlabeled data, and the authors investigated whether unsupervised pretraining transfers well across languages.

...read moreread less

Proceedings ArticleDOI

Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling

Jaejin Cho, +8 more

TL;DR: Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages.

...read moreread less

Proceedings ArticleDOI

Meta Learning for End-To-End Low-Resource Speech Recognition

Jui-Yang Hsu, +2 more

TL;DR: In this article, the authors proposed to apply meta learning approach for low-resource automatic speech recognition (ASR), and formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML).

...read moreread less

Proceedings ArticleDOI

Hierarchical Multitask Learning With CTC

Ramon Sanabria, +1 more

TL;DR: This paper shows how Hierarchical Multitask Learning can encourage the formation of useful intermediate representations by performing Connectionist Temporal Classification at different levels of the network with targets of different granularity.

...read moreread less

Proceedings ArticleDOI

Universal Phone Recognition with a Multilingual Allophone System

Xinjian Li, +10 more

TL;DR: This paper proposed a joint model of both language-independent phone and language-dependent phoneme distributions to improve low-resource phoneme error rate in multilingual ASR experiments over 11 languages, including Inuktitut and Tusom.

...read moreread less

Collapse

Sequence-Based Multi-Lingual Low Resource Speech Recognition

Citations

Unsupervised Pretraining Transfers Well Across Languages

Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling

Meta Learning for End-To-End Low-Resource Speech Recognition

Hierarchical Multitask Learning With CTC

Universal Phone Recognition with a Multilingual Allophone System

Related Papers (5)

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

The Kaldi Speech Recognition Toolkit

ESPNet: End-to-end speech processing toolkit

Attention is All you Need

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition

Trending Questions (1)