scispace - formally typeset
Proceedings ArticleDOI

Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages

TLDR
The proposed approach with retraining gave 6% - 11% relative improvements in character error rates over the monolingual baseline, and the language embedding learned from the proposed approach, when added to the acoustic feature vector, gave the best result.
Abstract
The recent success of the Transformer based sequence-to-sequence framework for various Natural Language Processing tasks has motivated its application to Automatic Speech Recognition. In this work, we explore the application of Transformers on low resource Indian languages in a multilingual framework. We explore various methods to incorporate language information into a multilingual Transformer, i.e., (i) at the decoder, (ii) at the encoder. These methods include using language identity tokens or providing language information to the acoustic vectors. Language information to the acoustic vectors can be given in the form of one hot vector or by learning a language embedding. From our experiments, we observed that providing language identity always improved performance. The language embedding learned from our proposed approach, when added to the acoustic feature vector, gave the best result. The proposed approach with retraining gave 6% - 11% relative improvements in character error rates over the monolingual baseline.

read more

Citations
More filters
Proceedings ArticleDOI

Exploring the use of Common Label Set to Improve Speech Recognition of Low Resource Indian Languages

TL;DR: In this article, the authors explore the benefits of representing similar target subword units (e.g., Byte Pair Encoded(BPE) units) through a Common Label Set (CLS).
Journal ArticleDOI

Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models

TL;DR: This paper exploits and analyzes a series of wav2vec pre-trained models for speech recognition in 15 low-resource languages in the OpenASR21 Challenge, and investigates data utilization, multilingual learning, and the use of a phoneme-level recognition task in fine-tuning.
Posted Content

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

TL;DR: This paper presents a comparative study of four different TL methods for RNN-T framework, showing 17% relative word error rate reduction with differentTL methods over randomly initialized Rnn-T model and showing the efficacy of TL for languages with small amount of training data.
Proceedings Article

A Survey of Multilingual Models for Automatic Speech Recognition

TL;DR: The state of the art in multilingual ASR models that are built with cross-lingual transfer in mind are surveyed, best practices for building multilingual models from research across diverse languages and techniques are presented and recommendations for future work are provided.
References
More filters
Posted Content

Transformers with convolutional context for ASR

TL;DR: This paper proposes replacing the sinusoidal positional embedding for transformers with convolutionally learned input representations that provide subsequent transformer blocks with relative positional information needed for discovering long-range relationships between local concepts.
Proceedings ArticleDOI

Language independent end-to-end architecture for joint language identification and speech recognition

TL;DR: This paper presents a model that can recognize speech in 10 different languages, by directly performing grapheme (character/chunked-character) based speech recognition, based on the hybrid attention/connectionist temporal classification (CTC) architecture.
Posted Content

Multilingual Speech Recognition With A Single End-To-End Model

TL;DR: This paper presented a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts, and found that this model, which was not explicitly given any information about language identity, improved recognition performance by 21% relative compared to analogous sequence to sequence models trained on each language individually.
Proceedings ArticleDOI

The Speechtransformer for Large-scale Mandarin Chinese Speech Recognition

TL;DR: This paper focuses on a large-scale Mandarin Chinese speech recognition task and proposes three optimization strategies to further improve the performance and efficiency of the SpeechTransformer, including a much lower frame rate.
Related Papers (5)