scispace - formally typeset
Y

Yu Zhang

Researcher at Google

Publications -  17
Citations -  3282

Yu Zhang is an academic researcher from Google. The author has contributed to research in topics: Autoencoder & Word error rate. The author has an hindex of 12, co-authored 17 publications receiving 1789 citations. Previous affiliations of Yu Zhang include Massachusetts Institute of Technology.

Papers
More filters
Proceedings ArticleDOI

Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions

TL;DR: Tacotron 2, a neural network architecture for speech synthesis directly from text that is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize time-domain waveforms from those Spectrograms is described.
Proceedings Article

WaveGrad: Estimating Gradients for Waveform Generation

TL;DR: WaveGrad offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in terms of audio quality.
Proceedings Article

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

TL;DR: A factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision by formulating it explicitly within a factorsized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables.
Proceedings ArticleDOI

Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization

TL;DR: Experimental results demonstrate that the proposed method can disentangle speaker and noise attributes even if they are correlated in the training data, and can be used to consistently synthesize clean speech for all speakers.
Proceedings ArticleDOI

Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis

TL;DR: A semi-supervised training framework is proposed to improve the data efficiency of Tacotron and allow it to utilize textual and acoustic knowledge contained in large, publicly-available text and speech corpora.