scispace - formally typeset
Open AccessProceedings Article

Heterogeneous measurements and multiple classifiers for speech recognition.

Reads0
Chats0
TLDR
In context-independent classification and context-dependent recognition on the TIMIT core test set using 39 classes, the system achieved error rates that are the lowest the authors have seen reported on these tasks.
Abstract
This paper addresses the problem of acoustic phonetic modeling. First, heterogeneous acoustic measurements are chosen in order to maximize the acoustic-phonetic information extracted from the speech signal in preprocessing. Second, classifier systems are presented for successfully utilizing high-dimensional acoustic measurement spaces. The techniques used for achieving these two goals can be broadly categorized as hierarchical, committeebased, or a hybrid of these two. This paper presents committeebased and hybrid approaches. In context-independent classification and context-dependent recognition on the TIMIT core test set using 39 classes, the system achieved error rates of 18.3% and 24.4%, respectively. These error rates are the lowest we have seen reported on these tasks. In addition, experiments with a telephone-based weather information word recognition task led to word error rate reductions of 10–16%.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Journal Article

Deep Neural Networks for Acoustic Modeling in Speech Recognition

TL;DR: This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Journal ArticleDOI

Acoustic Modeling Using Deep Belief Networks

TL;DR: It is shown that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain many layers of features and a very large number of parameters.
Journal ArticleDOI

JUPlTER: a telephone-based conversational interface for weather information

TL;DR: The purpose of this paper is to describe the development effort of JUPITER in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting.
Journal ArticleDOI

Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error

TL;DR: This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary speech recognition tasks.
References
More filters
Journal ArticleDOI

Speaker-independent phone recognition using hidden Markov models

TL;DR: The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data, and can be used as benchmarks to evaluate future systems.
Journal ArticleDOI

Speech recognition by machines and humans

TL;DR: Comparisons suggest that the human-machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech.
Proceedings ArticleDOI

A probabilistic framework for feature-based speech recognition

TL;DR: This paper examines a maximum a-posteriori decoding strategy for feature-based recognizers and develops a normalization criterion that is useful for a segment-based Viterbi or A* search.
Proceedings Article

High performance speaker-independent phone recognition using CDHMM.

TL;DR: It is shown that it is worthwhile to perform phone recognition experiments as opposed to only focusing attention on word recognition results, and high phone accuracies on three corpora: WSJ0, BREF and TIMIT are reported.
Proceedings ArticleDOI

Improved phone recognition using Bayesian triphone models

TL;DR: A new statistical framework, derived from the Bayesian principle, is introduced to perform a triphone model from less context-dependent models, and the potential power of this new framework is explored, both algorithmically and experimentally, by an implementation with hidden Markov modeling techniques.