Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems

doi:10.21437/INTERSPEECH.2021-1298

Proceedings ArticleDOI

Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems

About:

This article is published in Conference of the International Speech Communication Association.The article was published on 2021-08-30. It has received 12 citations till now. The article focuses on the topics: Softmax function.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

Cheng Zhang, +6 more

TL;DR: This paper proposes to modify the structure of the cascaded-encoder-based recurrent neural network transducer (RNN-T) model by integrating a per-frame language identiﬁer (LID) predictor, and shows that the proposed method can achieve accurate streaming LID prediction with little extra test-time cost.

...read moreread less

Posted Content

Towards Building ASR Systems for the Next Billion Users.

Tahir Javed, +7 more

- 06 Nov 2021 -

arXiv: Computation and Language

TL;DR: This article used 17,000 hours of raw speech data for 40 Indian languages from a wide variety of domains including education, news, technology, and finance to build ASR systems for low resource languages from the Indian subcontinent.

...read moreread less

Journal ArticleDOI

Towards Building ASR Systems for the Next Billion Users

- 28 Jun 2022 -

Proceedings of the ... AAAI Conference o...

TL;DR: This paper used 17,000 hours of raw speech data for 40 Indian languages from a wide variety of domains including education, news, technology, and finance to build ASR systems for low resource languages from the Indian subcontinent.

...read moreread less

Proceedings ArticleDOI

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

Andros Tjandra, +6 more

TL;DR: In this article , the authors explore large-scale multilingual ASR models on 70 languages and inspect two architectures: (1) Shared embedding and output and (2) Multiple embeddings and output model.

...read moreread less

Proceedings ArticleDOI

Global RNN Transducer Models For Multi-dialect Speech Recognition

Takashi Fukuda, +4 more

TL;DR: A novel modeling technique for constructing accurate, multi-dialect, speech recognition systems with a single uniﬁed model, based on recurrent neural network transducers (RNN-T), which does not incur any extra computational costs at decoding time.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, +20 more

TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

...read moreread less

Proceedings ArticleDOI

Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers

Jui-Ting Huang, +4 more

TL;DR: It is shown that the learned hidden layers sharing across languages can be transferred to improve recognition accuracy of new languages, with relative error reductions ranging from 6% to 28% against DNNs trained without exploiting the transferred hidden layers.

...read moreread less

Journal ArticleDOI

Automatic speech recognition for under-resourced languages: A survey

Laurent Besacier, +3 more

- 01 Jan 2014 -

Speech Communication

TL;DR: This paper proposes, in this paper, a survey that focuses on automatic speech recognition (ASR) for under-resourced languages, and a literature review of the recent contributions made.

...read moreread less

Journal ArticleDOI

Language-independent and language-adaptive acoustic modeling for speech recognition

Tanja Schultz, +3 more

- 01 Aug 2001 -

Speech Communication

TL;DR: Different methods for multilingual acoustic model combination and a polyphone decision tree specialization procedure are introduced for estimating acoustic models for a new target language using speech data from varied source languages, but only limited data from the target language.

...read moreread less

IEEE MultiMedia

Wireless bandwidth management for multiple video clients through network-assisted DASH

Tiia Ojanpera, +1 more

Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems

Citations

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

Towards Building ASR Systems for the Next Billion Users.

Towards Building ASR Systems for the Next Billion Users

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

Global RNN Transducer Models For Multi-dialect Speech Recognition

References

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers

Streaming End-to-end Speech Recognition for Mobile Devices

Automatic speech recognition for under-resourced languages: A survey

Language-independent and language-adaptive acoustic modeling for speech recognition

Related Papers (5)

Integrated Streaming Service Architecture: A Streaming Framework Compatible with Global Multimedia Databases 1

Streaming Multi-Context Systems

Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition

Adding a New Dimension to HTTP Adaptive Streaming Through Multiple-Source Capabilities

Wireless bandwidth management for multiple video clients through network-assisted DASH