scispace - formally typeset
Proceedings ArticleDOI

Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages

TLDR
The proposed approach with retraining gave 6% - 11% relative improvements in character error rates over the monolingual baseline, and the language embedding learned from the proposed approach, when added to the acoustic feature vector, gave the best result.
Abstract
The recent success of the Transformer based sequence-to-sequence framework for various Natural Language Processing tasks has motivated its application to Automatic Speech Recognition. In this work, we explore the application of Transformers on low resource Indian languages in a multilingual framework. We explore various methods to incorporate language information into a multilingual Transformer, i.e., (i) at the decoder, (ii) at the encoder. These methods include using language identity tokens or providing language information to the acoustic vectors. Language information to the acoustic vectors can be given in the form of one hot vector or by learning a language embedding. From our experiments, we observed that providing language identity always improved performance. The language embedding learned from our proposed approach, when added to the acoustic feature vector, gave the best result. The proposed approach with retraining gave 6% - 11% relative improvements in character error rates over the monolingual baseline.

read more

Citations
More filters
Proceedings ArticleDOI

Exploring the use of Common Label Set to Improve Speech Recognition of Low Resource Indian Languages

TL;DR: In this article, the authors explore the benefits of representing similar target subword units (e.g., Byte Pair Encoded(BPE) units) through a Common Label Set (CLS).
Journal ArticleDOI

Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models

TL;DR: This paper exploits and analyzes a series of wav2vec pre-trained models for speech recognition in 15 low-resource languages in the OpenASR21 Challenge, and investigates data utilization, multilingual learning, and the use of a phoneme-level recognition task in fine-tuning.
Posted Content

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

TL;DR: This paper presents a comparative study of four different TL methods for RNN-T framework, showing 17% relative word error rate reduction with differentTL methods over randomly initialized Rnn-T model and showing the efficacy of TL for languages with small amount of training data.
Proceedings Article

A Survey of Multilingual Models for Automatic Speech Recognition

TL;DR: The state of the art in multilingual ASR models that are built with cross-lingual transfer in mind are surveyed, best practices for building multilingual models from research across diverse languages and techniques are presented and recommendations for future work are provided.
References
More filters
Proceedings ArticleDOI

Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers

TL;DR: It is shown that the learned hidden layers sharing across languages can be transferred to improve recognition accuracy of new languages, with relative error reductions ranging from 6% to 28% against DNNs trained without exploiting the transferred hidden layers.
Proceedings ArticleDOI

A Comparative Study on Transformer vs RNN in Speech Applications

TL;DR: Transformer as mentioned in this paper is an emergent sequence-to-sequence model which achieves state-of-the-art performance in neural machine translation and other natural language processing applications, such as automatic speech recognition (ASR), speech translation (ST), and text to speech (TTS).
Proceedings ArticleDOI

A Comparative Study on Transformer vs RNN in Speech Applications

TL;DR: An emergent sequence-to-sequence model called Transformer achieves state-of-the-art performance in neural machine translation and other natural language processing applications, including the surprising superiority of Transformer in 13/15 ASR benchmarks in comparison with RNN.
Proceedings ArticleDOI

Multilingual training of deep neural networks

TL;DR: This work investigates multilingual modeling in the context of a DNN - hidden Markov model (HMM) hybrid, where the DNN outputs are used as the HMM state likelihoods and proposes that training the hidden layers on multiple languages makes them more suitable for cross-lingual transfer.
Proceedings ArticleDOI

Multilingual Speech Recognition with a Single End-to-End Model

TL;DR: This model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually and improves performance by an additional 7% relative and eliminate confusion between different languages.
Related Papers (5)