scispace - formally typeset
D

Dong Yu

Researcher at Tencent

Publications -  389
Citations -  45733

Dong Yu is an academic researcher from Tencent. The author has contributed to research in topics: Artificial neural network & Word error rate. The author has an hindex of 72, co-authored 339 publications receiving 39098 citations. Previous affiliations of Dong Yu include Peking University & Microsoft.

Papers
More filters
Patent

Multi-speaker speech separation

TL;DR: In this article, a multiple-output layer RNN is used to process an acoustic signal comprising speech from multiple speakers to trace an individual speaker's speech, where the output layer for each speaker can have the same dimensions and can be normalized for each output unit across all output layers.
Proceedings ArticleDOI

Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.

TL;DR: This paper encodes the residual error into a style embedding via a neural networkbased error encoder, which enables rapid adaptation to the desired style to be achieved with only a single adaptation utterance.
Proceedings ArticleDOI

Unsupervised learning from users' error correction in speech dictation.

TL;DR: An enhanced two-pass pronunciation learning algorithm is introduced that utilizes the output from both an ngram phoneme recognizer and a Letter-to-Sound component to adapt automatic speech recognition systems used in dictation systems through unsupervised learning from users’ error correction.
PatentDOI

Two-stage implementation for phonetic recognition using a bi-directional target-filtering model of speech coarticulation and reduction

TL;DR: In this article, a structured generative model of a speech coarticulation and reduction is described with a novel two-stage implementation, at the first stage, the dynamics of formants or vocal tract resonance (VTR) are generated using prior information of resonance targets in the phone sequence.
Proceedings ArticleDOI

Joint separation and denoising of noisy multi-talker speech using recurrent neural networks and permutation invariant training

TL;DR: In this paper, the authors used utterance-level Permutation Invariant Training (uPIT) for speaker independent multi-talker speech separation and denoising, simultaneously.