scispace - formally typeset
D

Dong Yu

Researcher at Tencent

Publications -  389
Citations -  45733

Dong Yu is an academic researcher from Tencent. The author has contributed to research in topics: Artificial neural network & Word error rate. The author has an hindex of 72, co-authored 339 publications receiving 39098 citations. Previous affiliations of Dong Yu include Peking University & Microsoft.

Papers
More filters
Journal ArticleDOI

Using continuous features in the maximum entropy model

TL;DR: A spline-based solution to the MaxEnt model with non-linear continuous weighting functions is proposed and it is illustrated that the optimization problem can be converted into a standard log-linear model at a higher-dimensional space.

Deep Learning for Signal and Information Processing

Li Deng, +1 more
TL;DR: Deep learning has been extensively studied in the literature up to March, 2013 as discussed by the authors, covering practical aspects in the fast development of deep learning research during the interim year, including the use of multiple layers of nonlinear transformations to derive features from the sensory signals.
Posted Content

Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks

TL;DR: In this article, the utterance-level permutation invariant training (uPIT) is proposed to separate multi-talker mixed speech without any prior knowledge of signal duration, number of speakers, speaker identity or gender.
Proceedings ArticleDOI

Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset

TL;DR: Experiments suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29.98% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system.
Journal ArticleDOI

Audio-Visual Speech Separation and Dereverberation With a Two-Stage Multimodal Network

TL;DR: This study addresses joint speech separation and dereverberation, which aims to separate target speech from background noise, interfering speech and room reverberation, and proposes a novel multimodal network that exploits both audio and visual signals.