scispace - formally typeset
F

Frank K. Soong

Researcher at Microsoft

Publications -  284
Citations -  6712

Frank K. Soong is an academic researcher from Microsoft. The author has contributed to research in topics: Hidden Markov model & Speech synthesis. The author has an hindex of 36, co-authored 275 publications receiving 6042 citations. Previous affiliations of Frank K. Soong include The Chinese University of Hong Kong.

Papers
More filters
Proceedings ArticleDOI

TTS Synthesis with Bidirectional LSTM based Recurrent Neural Networks

TL;DR: Recurrent Neural Networks (RNNs) with Bidirectional Long Short Term Memory (BLSTM) cells are adopted to capture the correlation or co-occurrence information between any two instants in a speech utterance for parametric TTS synthesis.
Book

Automatic Speech and Speaker Recognition: Advanced Topics

TL;DR: Automatic Speech and Speaker Recognition: Advanced Topics groups together in a single volume a number of important topics on speech and speaker recognition, topics which are of fundamental importance, but not yet covered in detail in existing textbooks.
Proceedings ArticleDOI

On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis

TL;DR: Experimental results show that DNN can outperform the conventional HMM, which is trained in ML first and then refined by MGE, and both objective and subjective measures indicate thatDNN can synthesize speech better than HMM-based baseline.
Patent

Voice persona service for embedding text-to-speech features into software programs

TL;DR: In this paper, the authors describe a voice persona service by which users convert text into speech waveforms, based on user-provided parameters and voice data from a service data store.
Journal ArticleDOI

Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers

TL;DR: Experimental results on an isolated English word corpus recorded by non-native (L2) English learners show that the proposed GOP measure can improve the performance of GOP based mispronunciation detection approach, i.e., 7.4 % of the precision and recall rate are improved, compared with the conventional GOP estimated from GMM-HMM.