Y
Yu Zhang
Researcher at Google
Publications - 17
Citations - 3282
Yu Zhang is an academic researcher from Google. The author has contributed to research in topics: Autoencoder & Word error rate. The author has an hindex of 12, co-authored 17 publications receiving 1789 citations. Previous affiliations of Yu Zhang include Massachusetts Institute of Technology.
Papers
More filters
Proceedings ArticleDOI
Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions
Jonathan Shen,Ruoming Pang,Ron Weiss,Mike Schuster,Navdeep Jaitly,Zongheng Yang,Zhifeng Chen,Yu Zhang,Yuxuan Wang,Rj Skerrv-Ryan,Rif A. Saurous,Yannis Agiomvrgiannakis,Yonghui Wu +12 more
TL;DR: Tacotron 2, a neural network architecture for speech synthesis directly from text that is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize time-domain waveforms from those Spectrograms is described.
Proceedings Article
WaveGrad: Estimating Gradients for Waveform Generation
TL;DR: WaveGrad offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in terms of audio quality.
Proceedings Article
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
TL;DR: A factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision by formulating it explicitly within a factorsized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables.
Proceedings ArticleDOI
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization
TL;DR: Experimental results demonstrate that the proposed method can disentangle speaker and noise attributes even if they are correlated in the training data, and can be used to consistently synthesize clean speech for all speakers.
Proceedings ArticleDOI
Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis
TL;DR: A semi-supervised training framework is proposed to improve the data efficiency of Tacotron and allow it to utilize textual and acoustic knowledge contained in large, publicly-available text and speech corpora.