F
Frank K. Soong
Researcher at Microsoft
Publications - 284
Citations - 6712
Frank K. Soong is an academic researcher from Microsoft. The author has contributed to research in topics: Hidden Markov model & Speech synthesis. The author has an hindex of 36, co-authored 275 publications receiving 6042 citations. Previous affiliations of Frank K. Soong include The Chinese University of Hong Kong.
Papers
More filters
Proceedings ArticleDOI
TTS Synthesis with Bidirectional LSTM based Recurrent Neural Networks
TL;DR: Recurrent Neural Networks (RNNs) with Bidirectional Long Short Term Memory (BLSTM) cells are adopted to capture the correlation or co-occurrence information between any two instants in a speech utterance for parametric TTS synthesis.
Book
Automatic Speech and Speaker Recognition: Advanced Topics
TL;DR: Automatic Speech and Speaker Recognition: Advanced Topics groups together in a single volume a number of important topics on speech and speaker recognition, topics which are of fundamental importance, but not yet covered in detail in existing textbooks.
Proceedings ArticleDOI
On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis
TL;DR: Experimental results show that DNN can outperform the conventional HMM, which is trained in ML first and then refined by MGE, and both objective and subjective measures indicate thatDNN can synthesize speech better than HMM-based baseline.
Patent
Voice persona service for embedding text-to-speech features into software programs
TL;DR: In this paper, the authors describe a voice persona service by which users convert text into speech waveforms, based on user-provided parameters and voice data from a service data store.
Journal ArticleDOI
Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers
TL;DR: Experimental results on an isolated English word corpus recorded by non-native (L2) English learners show that the proposed GOP measure can improve the performance of GOP based mispronunciation detection approach, i.e., 7.4 % of the precision and recall rate are improved, compared with the conventional GOP estimated from GMM-HMM.