Z
Zhiying Huang
Researcher at University of Science and Technology of China
Publications - 5
Citations - 40
Zhiying Huang is an academic researcher from University of Science and Technology of China. The author has contributed to research in topics: Speaker recognition & Engineering. The author has an hindex of 1, co-authored 3 publications receiving 24 citations.
Papers
More filters
Proceedings ArticleDOI
Speaker adaptation OF RNN-BLSTM for speech recognition based on speaker code
TL;DR: This paper studies how to conduct effective speaker code based speaker adaptation on RNN-BLSTM and demonstrates that theSpeaker code based adaptation method is also a valid adaptation method for RNN/LSTM.
Proceedings ArticleDOI
Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech
TL;DR: ProsoSpeech is proposed, which enhances the prosody using quantized latent vectors pre-trained on large-scale unpaired and low-quality text and speech data and can generate expressive speech conditioned on the predicted LPV.
Journal ArticleDOI
PolyVoice: Language Models for Speech to Speech Translation
Qianqian Dong,Zhiying Huang,Chen Xu,Kexin Wang,Xuxin Cheng,Tom Ko,Qiao Tian,Tang Li,Fengpeng Yue,Ye Bai,Xi Chen,Zejun Ma,Yuping Wang,Mingxuan Wang,Yuxuan Wang +14 more
TL;DR: PolyVoice as mentioned in this paper is a language model-based framework for speech-to-speech translation (S2ST) system, which consists of two language models: a translation language model and a speech synthesis language model.
Proceedings ArticleDOI
Unsupervised speaker adaptation of BLSTM-RNN for LVCSR based on speaker code
TL;DR: Evaluated speaker code based adaptation with singular value decomposition (SVD) method and an error normalization method to balance the back-propagation errors derived from different layers for speaker codes show better recognition performance than the i-vector based speaker adaptation of the same dimension.
Proceedings ArticleDOI
Rapid speaker adaptation based on D-code extracted from BLSTM-RNN in LVCSR
TL;DR: This paper proposes an alternative d-code extraction method to replace SC based on modeling speaker information with BLSTM-RNN which makes one-pass decoding possible and a speaker clustering approach is introduced to decrease the target number of speaker-BLSTM which accelerates training speed and improves ASR performance at the same time.