scispace - formally typeset
Y

Yan Deng

Researcher at Microsoft

Publications -  22
Citations -  199

Yan Deng is an academic researcher from Microsoft. The author has contributed to research in topics: Computer science & Prosody. The author has an hindex of 5, co-authored 19 publications receiving 142 citations. Previous affiliations of Yan Deng include Tsinghua University.

Papers
More filters
Proceedings ArticleDOI

Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

TL;DR: The experimental results show that the proposed stepwise monotonic attention method could achieve significant improvements in robustness on out-of-domain scenarios for phoneme-based models, without any regression on the in-domain naturalness test.
Posted Content

Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice

TL;DR: The multi-speaker latent space is investigated to improve neural TTS for adapting the system to new speakers with only several minutes of speech or enhancing a premium voice by utilizing the data from other speakers for richer contextual coverage and better generalization.
Journal ArticleDOI

Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition

TL;DR: Experiments on NIST 2003 and 2007 LRE evaluation corpora show that TFC is more effective than SDC, and that the GMM-based BDHLDA results in lower equal error rate (EER) and minimum average cost (Cavg) than either TFC or SDC approaches.
Proceedings ArticleDOI

Speech Bert Embedding for Improving Prosody in Neural TTS

TL;DR: In this article, a speech BERT model was proposed to extract embedded prosody information in speech segments for improving the prosody of synthesized speech in neural text-to-speech (TTS).
Proceedings ArticleDOI

Automatic language identification using support vector machines and phonetic N-gram

TL;DR: A new effective normalization method is proposed for language identification using support vector machines (SVM) and phonetic n-gram and it shows a relative reduction in terms of equal error rate (EER) compared with the traditional one.