scispace - formally typeset
D

Dong Yu

Researcher at Tencent

Publications -  389
Citations -  45733

Dong Yu is an academic researcher from Tencent. The author has contributed to research in topics: Artificial neural network & Word error rate. The author has an hindex of 72, co-authored 339 publications receiving 39098 citations. Previous affiliations of Dong Yu include Peking University & Microsoft.

Papers
More filters
Posted Content

Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures

TL;DR: In this article, a deep extractor network is proposed, which creates an extractor point for the target speaker in a canonical high dimensional embedding space, and pulls together the time-frequency bins corresponding to the target speakers.
Posted Content

Time Domain Audio Visual Speech Separation

TL;DR: Experiments on simulated mixtures show that the proposed time-domain audio-visual architecture for target speaker extraction from monaural mixtures can bring 3dB+ and 4dB+ Si-SNR improvements on two- and three-speaker cases respectively, compared to audio-only TasNet and frequency-domainaudio-visual networks.
Book ChapterDOI

Deep Neural Networks

Dong Yu, +1 more
TL;DR: The architecture of DNNs is depicted, the popular activation functions and training criteria are described, the famous backpropagation algorithm for learning DNN model parameters is illustrated, and practical tricks that make the training process robust are introduced.
Journal Article

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Songxiang Liu, +2 more
- 28 Jan 2022 - 
TL;DR: DiffGAN-TTS is a novel denoising diffusion probabilistic model (DDPM)-based text-to-speech (TTS) model achieving high-fidelity and efficient speech synthesis and an active shallow diffusion mechanism is presented to further speed up inference.
PatentDOI

Quantitative model for formant dynamics and contextually assimilated reduction in fluent speech

TL;DR: In this article, a method of identifying a sequence of formant trajectory values is provided in which the target values and the duration for each segment target for the formant are applied to a finite impulse response filter.