Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages

doi:10.1109/ICASSP40776.2020.9053808

Proceedings ArticleDOI

Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages

- pp 8279-8283

TLDR

The proposed approach with retraining gave 6% - 11% relative improvements in character error rates over the monolingual baseline, and the language embedding learned from the proposed approach, when added to the acoustic feature vector, gave the best result.

Abstract:

The recent success of the Transformer based sequence-to-sequence framework for various Natural Language Processing tasks has motivated its application to Automatic Speech Recognition. In this work, we explore the application of Transformers on low resource Indian languages in a multilingual framework. We explore various methods to incorporate language information into a multilingual Transformer, i.e., (i) at the decoder, (ii) at the encoder. These methods include using language identity tokens or providing language information to the acoustic vectors. Language information to the acoustic vectors can be given in the form of one hot vector or by learning a language embedding. From our experiments, we observed that providing language identity always improved performance. The language embedding learned from our proposed approach, when added to the acoustic feature vector, gave the best result. The proposed approach with retraining gave 6% - 11% relative improvements in character error rates over the monolingual baseline.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Exploring the use of Common Label Set to Improve Speech Recognition of Low Resource Indian Languages

Vishwas M. Shetty, +1 more

TL;DR: In this article, the authors explore the benefits of representing similar target subword units (e.g., Byte Pair Encoded(BPE) units) through a Common Label Set (CLS).

...read moreread less

Journal ArticleDOI

Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models

Jing Zhao, +1 more

- 01 Oct 2022 -

IEEE Journal of Selected Topics in Signa...

TL;DR: This paper exploits and analyzes a series of wav2vec pre-trained models for speech recognition in 15 low-resource languages in the OpenASR21 Challenge, and investigates data utilization, multilingual learning, and the use of a phoneme-level recognition task in fine-tuning.

...read moreread less

Posted Content

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

Vikas Joshi, +4 more

- 12 Aug 2020 -

arXiv: Audio and Speech Processing

TL;DR: This paper presents a comparative study of four different TL methods for RNN-T framework, showing 17% relative word error rate reduction with differentTL methods over randomly initialized Rnn-T model and showing the efficacy of TL for languages with small amount of training data.

...read moreread less

Proceedings ArticleDOI

Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems

Vikas Joshi, +5 more

Proceedings Article

A Survey of Multilingual Models for Automatic Speech Recognition

Hemant Yadav, +1 more

TL;DR: The state of the art in multilingual ASR models that are built with cross-lingual transfer in mind are surveyed, best practices for building multilingual models from research across diverse languages and techniques are presented and recommendations for future work are provided.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Posted Content

Transformers with convolutional context for ASR

Abdelrahman Mohamed, +2 more

- 26 Apr 2019 -

arXiv: Computation and Language

TL;DR: This paper proposes replacing the sinusoidal positional embedding for transformers with convolutionally learned input representations that provide subsequent transformer blocks with relative positional information needed for discovering long-range relationships between local concepts.

...read moreread less

Proceedings ArticleDOI

Language independent end-to-end architecture for joint language identification and speech recognition

Shinji Watanabe, +2 more

TL;DR: This paper presents a model that can recognize speech in 10 different languages, by directly performing grapheme (character/chunked-character) based speech recognition, based on the hybrid attention/connectionist temporal classification (CTC) architecture.

...read moreread less

Proceedings ArticleDOI

Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling

Jaejin Cho, +8 more

TL;DR: Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages.

...read moreread less

Posted Content

Multilingual Speech Recognition With A Single End-To-End Model

Shubham Toshniwal, +6 more

- 06 Nov 2017 -

arXiv: Audio and Speech Processing

TL;DR: This paper presented a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts, and found that this model, which was not explicitly given any information about language identity, improved recognition performance by 21% relative compared to analogous sequence to sequence models trained on each language individually.

...read moreread less

Proceedings ArticleDOI

The Speechtransformer for Large-scale Mandarin Chinese Speech Recognition

Zhao Yuanyuan, +3 more

TL;DR: This paper focuses on a large-scale Mandarin Chinese speech recognition task and proposes three optimization strategies to further improve the performance and efficiency of the SpeechTransformer, including a much lower frame rate.

...read moreread less

Related Papers (5)

Cross-language use of acoustic information for automatic speech recognition

Christoph Nieuwoudt, +1 more

- 01 Sep 2002 -

Speech Communication

Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages

Citations

Exploring the use of Common Label Set to Improve Speech Recognition of Low Resource Indian Languages

Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems

A Survey of Multilingual Models for Automatic Speech Recognition

References

Transformers with convolutional context for ASR

Language independent end-to-end architecture for joint language identification and speech recognition

Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling

Multilingual Speech Recognition With A Single End-To-End Model

The Speechtransformer for Large-scale Mandarin Chinese Speech Recognition

Related Papers (5)

Cross-language use of acoustic information for automatic speech recognition

Attention is All you Need

Learning Methods in Multilingual Speech Recognition

Multilingual Speech Recognition with a Single End-to-End Model

Universal Phone Recognition with a Multilingual Allophone System