Proceedings ArticleDOI
Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems
About:
This article is published in Conference of the International Speech Communication Association.The article was published on 2021-08-30. It has received 12 citations till now. The article focuses on the topics: Softmax function.read more
Citations
More filters
Proceedings ArticleDOI
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Cheng Zhang,Bo Li,Tara N. Sainath,Trevor Strohman,Sepand Mavandadi,Shuo-Yiin Chang,Parisa Haghani +6 more
TL;DR: This paper proposes to modify the structure of the cascaded-encoder-based recurrent neural network transducer (RNN-T) model by integrating a per-frame language identifier (LID) predictor, and shows that the proposed method can achieve accurate streaming LID prediction with little extra test-time cost.
Posted Content
Towards Building ASR Systems for the Next Billion Users.
Tahir Javed,Sumanth Doddapaneni,Abhigyan Raman,Kaushal Santosh Bhogale,G. Ramesh,Anoop Kunchukuttan,Pratyush Kumar,Mitesh M. Khapra +7 more
TL;DR: This article used 17,000 hours of raw speech data for 40 Indian languages from a wide variety of domains including education, news, technology, and finance to build ASR systems for low resource languages from the Indian subcontinent.
Journal ArticleDOI
Towards Building ASR Systems for the Next Billion Users
TL;DR: This paper used 17,000 hours of raw speech data for 40 Indian languages from a wide variety of domains including education, news, technology, and finance to build ASR systems for low resource languages from the Indian subcontinent.
Proceedings ArticleDOI
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Andros Tjandra,Nayan Singhal,David K. Zhang,Ozlem Kalinli,Abdelrahman Mohamed,Duc Le,Michael L. Seltzer +6 more
TL;DR: In this article , the authors explore large-scale multilingual ASR models on 70 languages and inspect two architectures: (1) Shared embedding and output and (2) Multiple embeddings and output model.
Proceedings ArticleDOI
Global RNN Transducer Models For Multi-dialect Speech Recognition
TL;DR: A novel modeling technique for constructing accurate, multi-dialect, speech recognition systems with a single unified model, based on recurrent neural network transducers (RNN-T), which does not incur any extra computational costs at decoding time.
References
More filters
Proceedings Article
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke,Sam Gross,Francisco Massa,Adam Lerer,James Bradbury,Gregory Chanan,Trevor Killeen,Zeming Lin,Natalia Gimelshein,Luca Antiga,Alban Desmaison,Andreas Kopf,Edward Z. Yang,Zachary DeVito,Martin Raison,Alykhan Tejani,Sasank Chilamkurthy,Benoit Steiner,Lu Fang,Junjie Bai,Soumith Chintala +20 more
TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.
Proceedings ArticleDOI
Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers
TL;DR: It is shown that the learned hidden layers sharing across languages can be transferred to improve recognition accuracy of new languages, with relative error reductions ranging from 6% to 28% against DNNs trained without exploiting the transferred hidden layers.
Proceedings ArticleDOI
Streaming End-to-end Speech Recognition for Mobile Devices
Yanzhang He,Tara N. Sainath,Rohit Prabhavalkar,Ian McGraw,Raziel Alvarez,Ding Zhao,David Rybach,Anjuli Kannan,Yonghui Wu,Ruoming Pang,Qiao Liang,Deepti Bhatia,Yuan Shangguan,Bo Li,Golan Pundak,Khe Chai Sim,Tom Bagby,Shuo-Yiin Chang,Kanishka Rao,Alexander H. Gruenstein +19 more
TL;DR: This work describes its efforts at building an E2E speech recog-nizer using a recurrent neural network transducer and finds that the proposed approach can outperform a conventional CTC-based model in terms of both latency and accuracy.
Journal ArticleDOI
Automatic speech recognition for under-resourced languages: A survey
TL;DR: This paper proposes, in this paper, a survey that focuses on automatic speech recognition (ASR) for under-resourced languages, and a literature review of the recent contributions made.
Journal ArticleDOI
Language-independent and language-adaptive acoustic modeling for speech recognition
TL;DR: Different methods for multilingual acoustic model combination and a polyphone decision tree specialization procedure are introduced for estimating acoustic models for a new target language using speech data from varied source languages, but only limited data from the target language.
Related Papers (5)
Integrated Streaming Service Architecture: A Streaming Framework Compatible with Global Multimedia Databases 1
Sungyoung Lee,Byungsoo Jeong +1 more