Home
/
Authors
/
Wu Chou

Author

Wu Chou

Other affiliations: Avaya, Indian Institute of Technology Madras

Bio: Wu Chou is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Linear regression & Pattern recognition (psychology). The author has an hindex of 6, co-authored 7 publications receiving 279 citations. Previous affiliations of Wu Chou include Avaya & Indian Institute of Technology Madras.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Discriminative learning in sequential pattern recognition

[...]

Xiaodong He¹, Li Deng¹, Wu Chou¹•Institutions (1)

Chinese Academy of Sciences¹

26 Sep 2008-IEEE Signal Processing Magazine

TL;DR: The main goal of this article is to provide an underlying foundation for MMI, MCE, and MPE/MWE at the objective function level to facilitate the development of new parameter optimization techniques and to incorporate other pattern recognition concepts, e.g., discriminative margins [66], into the current discrim inative learning paradigm.

...read moreread less

Abstract: In this article, we studied the objective functions of MMI, MCE, and MPE/MWE for discriminative learning in sequential pattern recognition. We presented an approach that unifies the objective functions of MMI, MCE, and MPE/MWE in a common rational-function form of (25). The exact structure of the rational-function form for each discriminative criterion was derived and studied. While the rational-function form of MMI has been known in the past, we provided the theoretical proof that the similar rational-function form exists for the objective functions of MCE and MPE/MWE. Moreover, we showed that the rational function forms for objective functions of MMI, MCE, and MPE/MWE differ in the constant weighting factors CDT (s1 . . . sR) and these weighting factors depend only on the labeled sequence s1 . . . sR, and are independent of the parameter set - to be optimized. The derived rational-function form for MMI, MCE, and MPE/MWE allows the GT/EBW-based parameter optimization framework to be applied directly in discriminative learning. In the past, lack of the appropriate rational-function form was a difficulty for MCE and MPE/MWE, because without this form, the GT/EBW-based parameter optimization framework cannot be directly applied. Based on the unified rational-function form, in a tutorial style, we derived the GT/EBW-based parameter optimization formulas for both discrete HMMs and CDHMMs in discriminative learning using MMI, MCE, and MPE/MWE criteria. The unifying review provided in this article has been based upon a large number of earlier contributions that have been cited and discussed throughout the article. Here we provide a brief summary of such background work. Extension to large-scale speech recognition tasks was accomplished in the work of [59] and [60]. The dissertation of [47] further improved the MMI criterion to that of MPE/MWE. In a parallel vein, the work of [20] provided an alternative approach to that of [41], with an attempt to more rigorously provide a CDHMM model re-estimation formula that gives positive growth of the MMI objective function. A crucial error of this attempt was corrected in [2] for establishing an existence proof of such positive growth. The main goal of this article is to provide an underlying foundation for MMI, MCE, and MPE/MWE at the objective function level to facilitate the development of new parameter optimization techniques and to incorporate other pattern recognition concepts, e.g., discriminative margins [66], into the current discriminative learning paradigm.

...read moreread less

203 citations

Proceedings Article•DOI•

Minimum classification error linear regression for acoustic model adaptation of continuous density HMMs

[...]

Xiaodong He¹, Wu Chou²•Institutions (2)

University of Missouri¹, Indian Institute of Technology Madras²

06 Jul 2003

TL;DR: Experimental results indicate that the proposed MCELR model adaptation can lead to significant speech recognition performance improvement and its performance advantage over the MLLR based approach is observed even when the amount of adaptation data is sparse.

...read moreread less

Abstract: In this paper, a concatenated "super" string model based minimum classification error (MCE) model adaptation approach is described. We show that the error rate minimization in the proposed approach can be formulated into maximizing a special ratio of two positive functions. The proposed string model is used to derive the growth transform based error rate minimization for MCE linear regression (MCELR). It provides an effective solution to apply MCE approach to acoustic model adaptation with sparse data. The proposed MCELR approach is studied and compared with the maximum likelihood linear regression (MLLR) based model adaptation. Experiments on large vocabulary speech recognition tasks are performed. Experimental results indicate that the proposed MCELR model adaptation can lead to significant speech recognition performance improvement and its performance advantage over the MLLR based approach is observed even when the amount of adaptation data is sparse.

...read moreread less

40 citations

Proceedings Article•DOI•

A Novel Learning Method for Hidden Markov Models in Speech and Audio Processing

[...]

Xiaodong He¹, Li Deng¹, Wu Chou²•Institutions (2)

Microsoft¹, Avaya²

01 Oct 2006

TL;DR: This work extends the original EBW algorithm and derive a novel method for MCE-based model parameter estimation and gives a solid theoretical basis, stable convergence, and it is well suited for the large-scale batch-mode training process essential in large- scale speech recognition and other pattern recognition applications.

...read moreread less

Abstract: in recent years, various discriminative learning techniques for HMMs have consistently yielded significant benefits in speech recognition. In this paper, we present a novel optimization technique using the Minimum Classification Error (MCE) criterion to optimize the HMM parameters. Unlike Maximum Mutual Information training where an Extended Baum-Welch (EBW) algorithm exists to optimize its objective function, for MCE training the original EBW algorithm cannot be directly applied. In this work, we extend the original EBW algorithm and derive a novel method for MCE-based model parameter estimation. Compared with conventional gradient descent methods for MCE learning, the proposed method gives a solid theoretical basis, stable convergence, and it is well suited for the large-scale batch-mode training process essential in large-scale speech recognition and other pattern recognition applications. Evaluation experiments, including model training and speech recognition, are reported on both a small vocabulary task (TI-Digits) and a large vocabulary task (WSJ), where the effectiveness of the proposed method is demonstrated. We expect new future applications and success of this novel learning method in general pattern recognition and multimedia processing, in addition to speech and audio processing applications we present in this paper.

...read moreread less

23 citations

Proceedings Article•

Minimum classification error (MCE) model adaptation of continuous density HMMS.

[...]

Xiaodong He¹, Wu Chou•Institutions (1)

University of Missouri¹

01 Jan 2003

TL;DR: It is shown that the error rate minimization in the proposed approach can be formulated into maximizing a special ratio of two positive functions, and from that a general growth transform algorithm is derived for MCE based model adaptation.

...read moreread less

Abstract: In this paper, a framework of minimum classification error (MCE) model adaptation for continuous density HMMs is proposed based on the approach of "super" string model. We show that the error rate minimization in the proposed approach can be formulated into maximizing a special ratio of two positive functions, and from that a general growth transform algorithm is derived for MCE based model adaptation. This algorithm departs from the generalized probability descent (GPD) algorithm, and it is well suited for model adaptation with a small amount of training data. The proposed approach is applied to linear regression based variance adaptation, and the close form solution for variance adaptation using MCE linear regression (MCELR) is derived. The MCELR approach is evaluated on large vocabulary speech recognition tasks. The relative performance gain is more than doubled on the standard (WSJ Spoke 3) database, comparing to maximum likelihood linear regression (MLLR) based variance adaptation for the same amount of adaptation data.

...read moreread less

8 citations

Proceedings Article•

Maximum a posteriori linear regression (MAPLR) variance adaptation for continuous density HMMS.

[...]

Wu Chou, Xiaodong He¹•Institutions (1)

University of Missouri¹

01 Jan 2003

TL;DR: Experimental results indicate that significant performance gain over the MLLR based variance adaptation can be obtained based on the proposed approach, which provides a consistent Bayesian theoretical framework to incorporate prior knowledge in linear regressionbased variance adaptation.

...read moreread less

Abstract: In this paper, the theoretical framework of maximum a posteriori linear regression (MAPLR) based variance adaptation for continuous density HMMs is described. In our approach, a class of informative prior distribution for MAPLR based variance adaptation is identified, from which the close form solution of MAPLR based variance adaptation is obtained under its EM formulation. Effects of the proposed prior distribution in MAPLR based variance adaptation are characterized and compared with conventional maximum likelihood linear regression (MLLR) based variance adaptation. These findings provide a consistent Bayesian theoretical framework to incorporate prior knowledge in linear regression based variance adaptation. Experiments on large vocabulary speech recognition tasks were performed. The experimental results indicate that significant performance gain over the MLLR based variance adaptation can be obtained based on the proposed approach.

...read moreread less

8 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

[...]

Geoffrey E. Hinton¹, Li Deng², Dong Yu², George E. Dahl¹, Abdelrahman Mohamed¹, Navdeep Jaitly¹, Andrew W. Senior³, Vincent Vanhoucke³, Patrick Nguyen³, Tara N. Sainath⁴, Brian Kingsbury⁴ - Show less +7 more•Institutions (4)

University of Toronto¹, Microsoft², Google³, IBM⁴

18 Oct 2012-IEEE Signal Processing Magazine

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

Abstract: Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

9,091 citations

Journal Article•DOI•

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

[...]

George E. Dahl¹, Dong Yu², Li Deng², Alex Acero²•Institutions (2)

University of Toronto¹, Microsoft²

01 Jan 2012-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.

...read moreread less

Abstract: We propose a novel context-dependent (CD) model for large-vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pre-training algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CD-DNN-HMMs to LVSR, and analyze the effects of various modeling choices on performance. Experiments on a challenging business search dataset demonstrate that CD-DNN-HMMs can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs, with an absolute sentence accuracy improvement of 5.8% and 9.2% (or relative error reduction of 16.0% and 23.2%) over the CD-GMM-HMMs trained using the minimum phone error rate (MPE) and maximum-likelihood (ML) criteria, respectively.

...read moreread less

3,120 citations

Book•

Deep Learning: Methods and Applications

[...]

Li Deng¹, Dong Yu¹•Institutions (1)

Microsoft¹

12 Jun 2014

TL;DR: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

...read moreread less

Abstract: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge of the authors; (2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and (3) the application areas that have the potential to be impacted significantly by deep learning and that have been experiencing research growth, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

...read moreread less

2,817 citations

Journal Article•

Deep Neural Networks for Acoustic Modeling in Speech Recognition

[...]

Geoffrey E. Hinton, Li Deng, Dong Yu, George E. Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew W. Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, Brian Kingsbury - Show less +7 more

01 Nov 2012-IEEE Signal Processing Magazine

TL;DR: This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.

...read moreread less

2,527 citations

Journal Article•DOI•

Convolutional neural networks for speech recognition

[...]

Ossama Abdel-Hamid¹, Abdelrahman Mohamed², Hui Jiang¹, Li Deng³, Gerald Penn², Dong Yu³ - Show less +2 more•Institutions (3)

York University¹, University of Toronto², Microsoft³

01 Oct 2014-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.

...read moreread less

Abstract: Recently, the hybrid deep neural network (DNN)- hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional Gaussian mixture model (GMM)-HMM. The performance improvement is partially attributed to the ability of the DNN to model complex correlations in speech features. In this paper, we show that further error rate reduction can be obtained by using convolutional neural networks (CNNs). We first present a concise description of the basic CNN and explain how it can be used for speech recognition. We further propose a limited-weight-sharing scheme that can better model speech features. The special structure such as local connectivity, weight sharing, and pooling in CNNs exhibits some degree of invariance to small shifts of speech features along the frequency axis, which is important to deal with speaker and environment variations. Experimental results show that CNNs reduce the error rate by 6%-10% compared with DNNs on the TIMIT phone recognition and the voice search large vocabulary speech recognition tasks.

...read moreread less

1,948 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

Collapse