Sequence-Based Multi-Lingual Low Resource Speech Recognition
Siddharth Dalmia,Ramon Sanabria,Florian Metze,Alan W. Black +3 more
- pp 4909-4913
Reads0
Chats0
TLDR
The authors showed that end-to-end multi-lingual training of sequence models is effective on context independent models trained using Connectionist Temporal Classification (CTC) loss and showed that the trained model can be adapted cross-lingually to an unseen language using just 25% of the target data.Abstract:
Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are attractive because of their simplicity and elegance. While it is possible to integrate traditional multi-lingual bottleneck feature extractors as front-ends, we show that end-to-end multi-lingual training of sequence models is effective on context independent models trained using Connectionist Temporal Classification (CTC) loss. We show that our model improves performance on Babel languages by over 6% absolute in terms of word/phoneme error rate when compared to mono-lingual systems built in the same setting for these languages. We also show that the trained model can be adapted cross-lingually to an unseen language using just 25% of the target data. We show that training on multiple languages is important for very low resource cross-lingual target scenarios, but not for multi-lingual testing scenarios. Here, it appears beneficial to include large well prepared datasets.read more
Citations
More filters
Proceedings ArticleDOI
Unsupervised Pretraining Transfers Well Across Languages
TL;DR: In this article, contrastive predictive coding (CPC) algorithms have been proposed to pretrain ASR systems with unlabeled data, and the authors investigated whether unsupervised pretraining transfers well across languages.
Proceedings ArticleDOI
Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling
Jaejin Cho,Murali Karthick Baskar,Ruizhi Li,Matthew Wiesner,Sri Harish Mallidi,Nelson Yalta,Martin Karafiat,Shinji Watanabe,Takaaki Hori +8 more
TL;DR: Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages.
Proceedings ArticleDOI
Meta Learning for End-To-End Low-Resource Speech Recognition
TL;DR: In this article, the authors proposed to apply meta learning approach for low-resource automatic speech recognition (ASR), and formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML).
Proceedings ArticleDOI
Hierarchical Multitask Learning With CTC
Ramon Sanabria,Florian Metze +1 more
TL;DR: This paper shows how Hierarchical Multitask Learning can encourage the formation of useful intermediate representations by performing Connectionist Temporal Classification at different levels of the network with targets of different granularity.
Proceedings ArticleDOI
Universal Phone Recognition with a Multilingual Allophone System
Xinjian Li,Siddharth Dalmia,Juncheng Li,Matthew Lee,Patrick Littell,Jiali Yao,Antonios Anastasopoulos,David R. Mortensen,Graham Neubig,Alan W. Black,Florian Metze +10 more
TL;DR: This paper proposed a joint model of both language-independent phone and language-dependent phoneme distributions to improve low-resource phoneme error rate in multilingual ASR experiments over 11 languages, including Inuktitut and Tusom.