scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Discriminative learning in sequential pattern recognition

26 Sep 2008-IEEE Signal Processing Magazine (IEEE)-Vol. 25, Iss: 5, pp 14-36
TL;DR: The main goal of this article is to provide an underlying foundation for MMI, MCE, and MPE/MWE at the objective function level to facilitate the development of new parameter optimization techniques and to incorporate other pattern recognition concepts, e.g., discriminative margins [66], into the current discrim inative learning paradigm.
Abstract: In this article, we studied the objective functions of MMI, MCE, and MPE/MWE for discriminative learning in sequential pattern recognition. We presented an approach that unifies the objective functions of MMI, MCE, and MPE/MWE in a common rational-function form of (25). The exact structure of the rational-function form for each discriminative criterion was derived and studied. While the rational-function form of MMI has been known in the past, we provided the theoretical proof that the similar rational-function form exists for the objective functions of MCE and MPE/MWE. Moreover, we showed that the rational function forms for objective functions of MMI, MCE, and MPE/MWE differ in the constant weighting factors CDT (s1 . . . sR) and these weighting factors depend only on the labeled sequence s1 . . . sR, and are independent of the parameter set - to be optimized. The derived rational-function form for MMI, MCE, and MPE/MWE allows the GT/EBW-based parameter optimization framework to be applied directly in discriminative learning. In the past, lack of the appropriate rational-function form was a difficulty for MCE and MPE/MWE, because without this form, the GT/EBW-based parameter optimization framework cannot be directly applied. Based on the unified rational-function form, in a tutorial style, we derived the GT/EBW-based parameter optimization formulas for both discrete HMMs and CDHMMs in discriminative learning using MMI, MCE, and MPE/MWE criteria. The unifying review provided in this article has been based upon a large number of earlier contributions that have been cited and discussed throughout the article. Here we provide a brief summary of such background work. Extension to large-scale speech recognition tasks was accomplished in the work of [59] and [60]. The dissertation of [47] further improved the MMI criterion to that of MPE/MWE. In a parallel vein, the work of [20] provided an alternative approach to that of [41], with an attempt to more rigorously provide a CDHMM model re-estimation formula that gives positive growth of the MMI objective function. A crucial error of this attempt was corrected in [2] for establishing an existence proof of such positive growth. The main goal of this article is to provide an underlying foundation for MMI, MCE, and MPE/MWE at the objective function level to facilitate the development of new parameter optimization techniques and to incorporate other pattern recognition concepts, e.g., discriminative margins [66], into the current discriminative learning paradigm.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Abstract: Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

9,091 citations

Journal ArticleDOI
TL;DR: A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
Abstract: We propose a novel context-dependent (CD) model for large-vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pre-training algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CD-DNN-HMMs to LVSR, and analyze the effects of various modeling choices on performance. Experiments on a challenging business search dataset demonstrate that CD-DNN-HMMs can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs, with an absolute sentence accuracy improvement of 5.8% and 9.2% (or relative error reduction of 16.0% and 23.2%) over the CD-GMM-HMMs trained using the minimum phone error rate (MPE) and maximum-likelihood (ML) criteria, respectively.

3,120 citations

Book
Li Deng1, Dong Yu1
12 Jun 2014
TL;DR: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.
Abstract: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge of the authors; (2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and (3) the application areas that have the potential to be impacted significantly by deep learning and that have been experiencing research growth, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

2,817 citations


Cites background or methods from "Discriminative learning in sequenti..."

  • ...Here, it is useful to point out a connection between the above pretraining/fine-tuning strategy associated with hybrid deep networks and the highly popular minimum phone error (MPE) training technique for the HMM (see [147, 290] for an overview)....

    [...]

  • ...Many of the discriminative techniques for supervised learning in signal and information processing are shallow architectures such as HMMs [52, 127, 147, 186, 188, 290, 394, 418] and conditional random fields (CRFs) [151, 155, 281, 400, 429, 446]....

    [...]

  • ...Ideally, D should contain all possible documents, as in the maximum mutual information training for speech recognition where all possible negative candidates may be considered [147]....

    [...]

  • ...The framework described in [144] enables end-to-end performance optimization in the overall deep architecture using the unified learning framework initially published in [147]....

    [...]

Journal Article
TL;DR: This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Abstract: Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

2,527 citations

Journal ArticleDOI
TL;DR: It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.
Abstract: Recently, the hybrid deep neural network (DNN)- hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional Gaussian mixture model (GMM)-HMM. The performance improvement is partially attributed to the ability of the DNN to model complex correlations in speech features. In this paper, we show that further error rate reduction can be obtained by using convolutional neural networks (CNNs). We first present a concise description of the basic CNN and explain how it can be used for speech recognition. We further propose a limited-weight-sharing scheme that can better model speech features. The special structure such as local connectivity, weight sharing, and pooling in CNNs exhibits some degree of invariance to small shifts of speech features along the frequency axis, which is important to deal with speaker and environment variations. Experimental results show that CNNs reduce the error rate by 6%-10% compared with DNNs on the TIMIT phone recognition and the voice search large vocabulary speech recognition tasks.

1,948 citations

References
More filters
01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

26,531 citations

Proceedings Article
28 Jun 2001
TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Abstract: We present conditional random fields , a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. We present iterative parameter estimation algorithms for conditional random fields and compare the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

13,190 citations

Book
01 Jan 1993
TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Abstract: 1. Fundamentals of Speech Recognition. 2. The Speech Signal: Production, Perception, and Acoustic-Phonetic Characterization. 3. Signal Processing and Analysis Methods for Speech Recognition. 4. Pattern Comparison Techniques. 5. Speech Recognition System Design and Implementation Issues. 6. Theory and Implementation of Hidden Markov Models. 7. Speech Recognition Based on Connected Word Models. 8. Large Vocabulary Continuous Speech Recognition. 9. Task-Oriented Applications of Automatic Speech Recognition.

8,442 citations


"Discriminative learning in sequenti..." refers methods in this paper

  • ...This posterior probability can be computed using an efficient forward-backward algorithm [50]....

    [...]

  • ...…comes from the fact that sentence tokens in the training set are independent of each other. γ i,r,s r (t ) is the occupation probability of state i at time t , given the label sequence s r and observation sequence X r , which can be obtained through an efficient forward-backward algorithm [50]....

    [...]

Book
01 Feb 1996
TL;DR: In this paper, Schnabel proposed a modular system of algorithms for unconstrained minimization and nonlinear equations, based on Newton's method for solving one equation in one unknown convergence of sequences of real numbers.
Abstract: Preface 1. Introduction. Problems to be considered Characteristics of 'real-world' problems Finite-precision arithmetic and measurement of error Exercises 2. Nonlinear Problems in One Variable. What is not possible Newton's method for solving one equation in one unknown Convergence of sequences of real numbers Convergence of Newton's method Globally convergent methods for solving one equation in one uknown Methods when derivatives are unavailable Minimization of a function of one variable Exercises 3. Numerical Linear Algebra Background. Vector and matrix norms and orthogonality Solving systems of linear equations-matrix factorizations Errors in solving linear systems Updating matrix factorizations Eigenvalues and positive definiteness Linear least squares Exercises 4. Multivariable Calculus Background Derivatives and multivariable models Multivariable finite-difference derivatives Necessary and sufficient conditions for unconstrained minimization Exercises 5. Newton's Method for Nonlinear Equations and Unconstrained Minimization. Newton's method for systems of nonlinear equations Local convergence of Newton's method The Kantorovich and contractive mapping theorems Finite-difference derivative methods for systems of nonlinear equations Newton's method for unconstrained minimization Finite difference derivative methods for unconstrained minimization Exercises 6. Globally Convergent Modifications of Newton's Method. The quasi-Newton framework Descent directions Line searches The model-trust region approach Global methods for systems of nonlinear equations Exercises 7. Stopping, Scaling, and Testing. Scaling Stopping criteria Testing Exercises 8. Secant Methods for Systems of Nonlinear Equations. Broyden's method Local convergence analysis of Broyden's method Implementation of quasi-Newton algorithms using Broyden's update Other secant updates for nonlinear equations Exercises 9. Secant Methods for Unconstrained Minimization. The symmetric secant update of Powell Symmetric positive definite secant updates Local convergence of positive definite secant methods Implementation of quasi-Newton algorithms using the positive definite secant update Another convergence result for the positive definite secant method Other secant updates for unconstrained minimization Exercises 10. Nonlinear Least Squares. The nonlinear least-squares problem Gauss-Newton-type methods Full Newton-type methods Other considerations in solving nonlinear least-squares problems Exercises 11. Methods for Problems with Special Structure. The sparse finite-difference Newton method Sparse secant methods Deriving least-change secant updates Analyzing least-change secant methods Exercises Appendix A. A Modular System of Algorithms for Unconstrained Minimization and Nonlinear Equations (by Robert Schnabel) Appendix B. Test Problems (by Robert Schnabel) References Author Index Subject Index.

6,831 citations