scispace - formally typeset
Open AccessProceedings Article

A Generative Modeling Framework for Structured Hidden Speech Dynamics

Li Deng, +1 more
Reads0
Chats0
TLDR
A structured speech model is outlined, equipped with long-contextual-span capabilities that are missing in the HMM approach, and the pros and cons of the structured generative modeling approach in comparison with the structured discriminative classification approach are discussed.
Abstract
We outline a structured speech model, as a special and perhaps extreme form of probabilistic generative modeling. The model is equipped with long-contextual-span capabilities that are missing in the HMM approach. Compact (and physically meaningful) parameterization of the model is made possible by the continuity constraint in the hidden vocal tract resonance (VTR) domain. The target-directed VTR dynamics jointly characterize coarticulation and incomplete articulation (reduction). Preliminary evaluation results are presented on the standard TIMIT phonetic recognition task, showing the best result in this task reported in the literature without using many heterogeneous classifier combinations. The pros and cons of our structured generative modeling approach, in comparison with the structured discriminative classification approach, are discussed.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Discriminative learning in sequential pattern recognition

TL;DR: The main goal of this article is to provide an underlying foundation for MMI, MCE, and MPE/MWE at the objective function level to facilitate the development of new parameter optimization techniques and to incorporate other pattern recognition concepts, e.g., discriminative margins [66], into the current discrim inative learning paradigm.
Book ChapterDOI

Phoneme Recognition on the TIMIT Database

TL;DR: Speech recognition based on phones is very attractive since it is inherently free from vocabulary limitations, but large Vocabulary ASR systems’ performance depends on the quality of the phone recognizer, so research teams continue developing phone recognizers, in order to enhance their performance as much as possible.
Book

Discriminative learning for speech recognition

TL;DR: This book introduces the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition and includes technical details on the derivation of the parameter optimization formulas for exponential-family distribut ons, discrete hidden Markov models (HMMs), and continuous-density HMMs in discriminating learning.
Posted Content

Phoneme recognition in TIMIT with BLSTM-CTC

TL;DR: The performance of a recurrent neural network is compared with the best results published so far on phoneme recognition in the TIMIT database and a single recurrent network is applied to the same task.
Patent

Minimum classification error training with growth transformation optimization

TL;DR: In this paper, the hidden Markov model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function using the list of N-best word sequences obtained by decoding the training data with the current-iteration HMM parameters.
References
More filters
Book

Fundamentals of speech recognition

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Journal ArticleDOI

From HMM's to segment models: a unified view of stochastic modeling for speech recognition

TL;DR: A general stochastic model is described that encompasses most of the models proposed in the literature for speech recognition, pointing out similarities in terms of correlation and parameter tying assumptions, and drawing analogies between segment models and HMMs.
Proceedings ArticleDOI

Hidden conditional random fields for phone classification.

TL;DR: This paper presents the results on the TIMIT phone classification task and shows that HCRFs outperforms comparable ML and CML/MMI trained HMMs and has the ability to handle complex features without any change in training procedure.
Journal ArticleDOI

Structured language modeling

TL;DR: An attempt at using the syntactic structure in natural language for improved language models for speech recognition using an original probabilistic parameterization of a shift-reduce parser.
Journal ArticleDOI

A probabilistic framework for segment-based speech recognition

TL;DR: This work examines a maximum a posteriori decoding strategy for feature-based recognizers and develops a normalization criterion useful for a segment-based speech recognizer.
Related Papers (5)