Proceedings ArticleDOI
Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages
Vishwas M. Shetty,Metilda Sagaya Mary N. J,Srinivasan Umesh +2 more
- pp 8279-8283
TLDR
The proposed approach with retraining gave 6% - 11% relative improvements in character error rates over the monolingual baseline, and the language embedding learned from the proposed approach, when added to the acoustic feature vector, gave the best result.Abstract:
The recent success of the Transformer based sequence-to-sequence framework for various Natural Language Processing tasks has motivated its application to Automatic Speech Recognition. In this work, we explore the application of Transformers on low resource Indian languages in a multilingual framework. We explore various methods to incorporate language information into a multilingual Transformer, i.e., (i) at the decoder, (ii) at the encoder. These methods include using language identity tokens or providing language information to the acoustic vectors. Language information to the acoustic vectors can be given in the form of one hot vector or by learning a language embedding. From our experiments, we observed that providing language identity always improved performance. The language embedding learned from our proposed approach, when added to the acoustic feature vector, gave the best result. The proposed approach with retraining gave 6% - 11% relative improvements in character error rates over the monolingual baseline.read more
Citations
More filters
Proceedings ArticleDOI
Exploring the use of Common Label Set to Improve Speech Recognition of Low Resource Indian Languages
TL;DR: In this article, the authors explore the benefits of representing similar target subword units (e.g., Byte Pair Encoded(BPE) units) through a Common Label Set (CLS).
Journal ArticleDOI
Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models
Jing Zhao,Wei-Qiang Zhang +1 more
TL;DR: This paper exploits and analyzes a series of wav2vec pre-trained models for speech recognition in 15 low-resource languages in the OpenASR21 Challenge, and investigates data utilization, multilingual learning, and the use of a phoneme-level recognition task in fine-tuning.
Posted Content
Transfer Learning Approaches for Streaming End-to-End Speech Recognition System
TL;DR: This paper presents a comparative study of four different TL methods for RNN-T framework, showing 17% relative word error rate reduction with differentTL methods over randomly initialized Rnn-T model and showing the efficacy of TL for languages with small amount of training data.
Proceedings ArticleDOI
Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems
Proceedings Article
A Survey of Multilingual Models for Automatic Speech Recognition
Hemant Yadav,Soundarya Sitaram +1 more
TL;DR: The state of the art in multilingual ASR models that are built with cross-lingual transfer in mind are surveyed, best practices for building multilingual models from research across diverse languages and techniques are presented and recommendations for future work are provided.
References
More filters
Posted Content
Transformers with convolutional context for ASR
TL;DR: This paper proposes replacing the sinusoidal positional embedding for transformers with convolutionally learned input representations that provide subsequent transformer blocks with relative positional information needed for discovering long-range relationships between local concepts.
Proceedings ArticleDOI
Language independent end-to-end architecture for joint language identification and speech recognition
TL;DR: This paper presents a model that can recognize speech in 10 different languages, by directly performing grapheme (character/chunked-character) based speech recognition, based on the hybrid attention/connectionist temporal classification (CTC) architecture.
Proceedings ArticleDOI
Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling
Jaejin Cho,Murali Karthick Baskar,Ruizhi Li,Matthew Wiesner,Sri Harish Mallidi,Nelson Yalta,Martin Karafiat,Shinji Watanabe,Takaaki Hori +8 more
TL;DR: Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages.
Posted Content
Multilingual Speech Recognition With A Single End-To-End Model
Shubham Toshniwal,Tara N. Sainath,Ron Weiss,Bo Li,Pedro J. Moreno,Eugene Weinstein,Kanishka Rao +6 more
TL;DR: This paper presented a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts, and found that this model, which was not explicitly given any information about language identity, improved recognition performance by 21% relative compared to analogous sequence to sequence models trained on each language individually.
Proceedings ArticleDOI
The Speechtransformer for Large-scale Mandarin Chinese Speech Recognition
TL;DR: This paper focuses on a large-scale Mandarin Chinese speech recognition task and proposes three optimization strategies to further improve the performance and efficiency of the SpeechTransformer, including a much lower frame rate.