Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages

doi:10.1109/ICASSP40776.2020.9053808

Proceedings ArticleDOI

Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages

- pp 8279-8283

TLDR

The proposed approach with retraining gave 6% - 11% relative improvements in character error rates over the monolingual baseline, and the language embedding learned from the proposed approach, when added to the acoustic feature vector, gave the best result.

Abstract:

The recent success of the Transformer based sequence-to-sequence framework for various Natural Language Processing tasks has motivated its application to Automatic Speech Recognition. In this work, we explore the application of Transformers on low resource Indian languages in a multilingual framework. We explore various methods to incorporate language information into a multilingual Transformer, i.e., (i) at the decoder, (ii) at the encoder. These methods include using language identity tokens or providing language information to the acoustic vectors. Language information to the acoustic vectors can be given in the form of one hot vector or by learning a language embedding. From our experiments, we observed that providing language identity always improved performance. The language embedding learned from our proposed approach, when added to the acoustic feature vector, gave the best result. The proposed approach with retraining gave 6% - 11% relative improvements in character error rates over the monolingual baseline.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Exploring the use of Common Label Set to Improve Speech Recognition of Low Resource Indian Languages

Vishwas M. Shetty, +1 more

TL;DR: In this article, the authors explore the benefits of representing similar target subword units (e.g., Byte Pair Encoded(BPE) units) through a Common Label Set (CLS).

...read moreread less

Journal ArticleDOI

Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models

Jing Zhao, +1 more

- 01 Oct 2022 -

IEEE Journal of Selected Topics in Signa...

TL;DR: This paper exploits and analyzes a series of wav2vec pre-trained models for speech recognition in 15 low-resource languages in the OpenASR21 Challenge, and investigates data utilization, multilingual learning, and the use of a phoneme-level recognition task in fine-tuning.

...read moreread less

Posted Content

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

Vikas Joshi, +4 more

- 12 Aug 2020 -

arXiv: Audio and Speech Processing

TL;DR: This paper presents a comparative study of four different TL methods for RNN-T framework, showing 17% relative word error rate reduction with differentTL methods over randomly initialized Rnn-T model and showing the efficacy of TL for languages with small amount of training data.

...read moreread less

Proceedings ArticleDOI

Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems

Vikas Joshi, +5 more

Proceedings Article

A Survey of Multilingual Models for Automatic Speech Recognition

Hemant Yadav, +1 more

TL;DR: The state of the art in multilingual ASR models that are built with cross-lingual transfer in mind are surveyed, best practices for building multilingual models from research across diverse languages and techniques are presented and recommendations for future work are provided.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers

Jui-Ting Huang, +4 more

TL;DR: It is shown that the learned hidden layers sharing across languages can be transferred to improve recognition accuracy of new languages, with relative error reductions ranging from 6% to 28% against DNNs trained without exploiting the transferred hidden layers.

...read moreread less

Proceedings ArticleDOI

A Comparative Study on Transformer vs RNN in Speech Applications

Shigeki Karita, +12 more

TL;DR: Transformer as mentioned in this paper is an emergent sequence-to-sequence model which achieves state-of-the-art performance in neural machine translation and other natural language processing applications, such as automatic speech recognition (ASR), speech translation (ST), and text to speech (TTS).

...read moreread less

Proceedings ArticleDOI

A Comparative Study on Transformer vs RNN in Speech Applications

Shigeki Karita, +12 more

- 13 Sep 2019 -

arXiv: Computation and Language

TL;DR: An emergent sequence-to-sequence model called Transformer achieves state-of-the-art performance in neural machine translation and other natural language processing applications, including the surprising superiority of Transformer in 13/15 ASR benchmarks in comparison with RNN.

...read moreread less

Proceedings ArticleDOI

Multilingual training of deep neural networks

Arnab Ghoshal, +2 more

TL;DR: This work investigates multilingual modeling in the context of a DNN - hidden Markov model (HMM) hybrid, where the DNN outputs are used as the HMM state likelihoods and proposes that training the hidden layers on multiple languages makes them more suitable for cross-lingual transfer.

...read moreread less

Proceedings ArticleDOI

Multilingual Speech Recognition with a Single End-to-End Model

Shubham Toshniwal, +6 more

TL;DR: This model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually and improves performance by an additional 7% relative and eliminate confusion between different languages.

...read moreread less

Related Papers (5)

Cross-language use of acoustic information for automatic speech recognition

Christoph Nieuwoudt, +1 more

- 01 Sep 2002 -

Speech Communication

Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages

Citations

Exploring the use of Common Label Set to Improve Speech Recognition of Low Resource Indian Languages

Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems

A Survey of Multilingual Models for Automatic Speech Recognition

References

Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers

A Comparative Study on Transformer vs RNN in Speech Applications

A Comparative Study on Transformer vs RNN in Speech Applications

Multilingual training of deep neural networks

Multilingual Speech Recognition with a Single End-to-End Model

Related Papers (5)

Cross-language use of acoustic information for automatic speech recognition

Attention is All you Need

Learning Methods in Multilingual Speech Recognition

Multilingual Speech Recognition with a Single End-to-End Model

Universal Phone Recognition with a Multilingual Allophone System