Yanzhang He

Proceedings ArticleDOI

Streaming End-to-end Speech Recognition for Mobile Devices

TL;DR: This work describes its efforts at building an E2E speech recog-nizer using a recurrent neural network transducer and finds that the proposed approach can outperform a conventional CTC-based model in terms of both latency and accuracy.

...read moreread less

Posted Content

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Jonathan Shen, +90 more

- 21 Feb 2019 -

arXiv: Learning

TL;DR: This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the Framework.

...read moreread less

Proceedings ArticleDOI

A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency

Tara N. Sainath, +27 more

TL;DR: In this article, a first-pass Recurrent Neural Network Transducer (RNN-T) model and a second-pass Listen, Attend, Spell (LAS) rescorer were developed.

...read moreread less

Proceedings ArticleDOI

Two-Pass End-to-End Speech Recognition

Tara N. Sainath, +11 more

TL;DR: In this paper, two-pass automatic speech recognition (ASR) models are used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data.

...read moreread less

Proceedings ArticleDOI

Towards Fast and Accurate Streaming End-To-End ASR

Bo Li, +6 more

TL;DR: This work proposes to reduce E2E model’s latency by extending the RNN-T endpointer (RNN- T EP) model with additional early and late penalties and achieves 8.0% relative word error rate (WER) reduction and 130ms 90-percentile latency reduction over [2] on a Voice Search test set.

...read moreread less

Papers

Streaming End-to-end Speech Recognition for Mobile Devices

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency

Two-Pass End-to-End Speech Recognition

Towards Fast and Accurate Streaming End-To-End ASR