scispace - formally typeset
Proceedings ArticleDOI

Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems

About
This article is published in Conference of the International Speech Communication Association.The article was published on 2021-08-30. It has received 12 citations till now. The article focuses on the topics: Softmax function.

read more

Citations
More filters
Proceedings ArticleDOI

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

TL;DR: This paper proposes to modify the structure of the cascaded-encoder-based recurrent neural network transducer (RNN-T) model by integrating a per-frame language identifier (LID) predictor, and shows that the proposed method can achieve accurate streaming LID prediction with little extra test-time cost.
Posted Content

Towards Building ASR Systems for the Next Billion Users.

TL;DR: This article used 17,000 hours of raw speech data for 40 Indian languages from a wide variety of domains including education, news, technology, and finance to build ASR systems for low resource languages from the Indian subcontinent.
Journal ArticleDOI

Towards Building ASR Systems for the Next Billion Users

TL;DR: This paper used 17,000 hours of raw speech data for 40 Indian languages from a wide variety of domains including education, news, technology, and finance to build ASR systems for low resource languages from the Indian subcontinent.
Proceedings ArticleDOI

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

TL;DR: In this article , the authors explore large-scale multilingual ASR models on 70 languages and inspect two architectures: (1) Shared embedding and output and (2) Multiple embeddings and output model.
Proceedings ArticleDOI

Global RNN Transducer Models For Multi-dialect Speech Recognition

TL;DR: A novel modeling technique for constructing accurate, multi-dialect, speech recognition systems with a single unified model, based on recurrent neural network transducers (RNN-T), which does not incur any extra computational costs at decoding time.
References
More filters
Proceedings ArticleDOI

Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers

TL;DR: It is shown that the learned hidden layers sharing across languages can be transferred to improve recognition accuracy of new languages, with relative error reductions ranging from 6% to 28% against DNNs trained without exploiting the transferred hidden layers.
Proceedings ArticleDOI

Streaming End-to-end Speech Recognition for Mobile Devices

TL;DR: This work describes its efforts at building an E2E speech recog-nizer using a recurrent neural network transducer and finds that the proposed approach can outperform a conventional CTC-based model in terms of both latency and accuracy.
Journal ArticleDOI

Automatic speech recognition for under-resourced languages: A survey

TL;DR: This paper proposes, in this paper, a survey that focuses on automatic speech recognition (ASR) for under-resourced languages, and a literature review of the recent contributions made.
Journal ArticleDOI

Language-independent and language-adaptive acoustic modeling for speech recognition

TL;DR: Different methods for multilingual acoustic model combination and a polyphone decision tree specialization procedure are introduced for estimating acoustic models for a new target language using speech data from varied source languages, but only limited data from the target language.
Related Papers (5)