Open AccessProceedings Article
Heterogeneous measurements and multiple classifiers for speech recognition.
Reads0
Chats0
TLDR
In context-independent classification and context-dependent recognition on the TIMIT core test set using 39 classes, the system achieved error rates that are the lowest the authors have seen reported on these tasks.Abstract:
This paper addresses the problem of acoustic phonetic modeling. First, heterogeneous acoustic measurements are chosen in order to maximize the acoustic-phonetic information extracted from the speech signal in preprocessing. Second, classifier systems are presented for successfully utilizing high-dimensional acoustic measurement spaces. The techniques used for achieving these two goals can be broadly categorized as hierarchical, committeebased, or a hybrid of these two. This paper presents committeebased and hybrid approaches. In context-independent classification and context-dependent recognition on the TIMIT core test set using 39 classes, the system achieved error rates of 18.3% and 24.4%, respectively. These error rates are the lowest we have seen reported on these tasks. In addition, experiments with a telephone-based weather information word recognition task led to word error rate reductions of 10–16%.read more
Citations
More filters
Journal ArticleDOI
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
Geoffrey E. Hinton,Li Deng,Dong Yu,George E. Dahl,Abdelrahman Mohamed,Navdeep Jaitly,Andrew W. Senior,Vincent Vanhoucke,Patrick Nguyen,Tara N. Sainath,Brian Kingsbury +10 more
TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Journal Article
Deep Neural Networks for Acoustic Modeling in Speech Recognition
Geoffrey E. Hinton,Li Deng,Dong Yu,George E. Dahl,Abdelrahman Mohamed,Navdeep Jaitly,Andrew W. Senior,Vincent Vanhoucke,Patrick Nguyen,Tara N. Sainath,Brian Kingsbury +10 more
TL;DR: This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Journal ArticleDOI
Acoustic Modeling Using Deep Belief Networks
TL;DR: It is shown that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain many layers of features and a very large number of parameters.
Journal ArticleDOI
JUPlTER: a telephone-based conversational interface for weather information
Victor W. Zue,Stephanie Seneff,James Glass,Joseph Polifroni,Christine Pao,Timothy J. Hazen,Lee Hetherington +6 more
TL;DR: The purpose of this paper is to describe the development effort of JUPITER in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting.
Journal ArticleDOI
Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error
TL;DR: This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary speech recognition tasks.
References
More filters
Journal ArticleDOI
Speaker-independent phone recognition using hidden Markov models
Kai-Fu Lee,H.-W. Hon +1 more
TL;DR: The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data, and can be used as benchmarks to evaluate future systems.
Journal ArticleDOI
Speech recognition by machines and humans
TL;DR: Comparisons suggest that the human-machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech.
Proceedings ArticleDOI
A probabilistic framework for feature-based speech recognition
TL;DR: This paper examines a maximum a-posteriori decoding strategy for feature-based recognizers and develops a normalization criterion that is useful for a segment-based Viterbi or A* search.
Proceedings Article
High performance speaker-independent phone recognition using CDHMM.
Lori Lamel,Jean-Luc Gauvain +1 more
TL;DR: It is shown that it is worthwhile to perform phone recognition experiments as opposed to only focusing attention on word recognition results, and high phone accuracies on three corpora: WSJ0, BREF and TIMIT are reported.
Proceedings ArticleDOI
Improved phone recognition using Bayesian triphone models
Ji Ming,Francis Jack Smith +1 more
TL;DR: A new statistical framework, derived from the Bayesian principle, is introduced to perform a triphone model from less context-dependent models, and the potential power of this new framework is explored, both algorithmically and experimentally, by an implementation with hidden Markov modeling techniques.