scispace - formally typeset
Journal ArticleDOI

A review of large-vocabulary continuous-speech

Steve Young
- 01 Sep 1996 - 
- Vol. 13, Iss: 5, pp 45
Reads0
Chats0
TLDR
The principles and architecture of current LVR systems are discussed and the key issues affecting their future deployment are identified; to illustrate the various points raised, the Cambridge University HTK system is described.
Abstract
Considerable progress has been made in speech-recognition technology over the last few years and nowhere has this progress been more evident than in the area of large-vocabulary recognition (LVR). Current laboratory systems are capable of transcribing continuous speech from any speaker with average word-error rates between 5% and 10%. If speaker adaptation is allowed, then after 2 or 3 minutes of speech, the error rate will drop well below 5% for most speakers. LVR systems had been limited to dictation applications since the systems were speaker dependent and required words to be spoken with a short pause between them. However, the capability to recognize natural continuous-speech input from any speaker opens up many more applications. As a result, LVR technology appears to be on the brink of widespread deployment across a range of information technology (IT) systems. This article discusses the principles and architecture of current LVR systems and identifies the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system is described. This system is a modem design that gives state-of-the-art performance, and it is typical of the current generation of recognition systems.

read more

Citations
More filters
Journal ArticleDOI

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Journal ArticleDOI

Bidirectional recurrent neural networks

TL;DR: It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.
Journal ArticleDOI

Survey on speech emotion recognition: Features, classification schemes, and databases

TL;DR: A survey of speech emotion classification addressing three important aspects of the design of a speech emotion recognition system, the choice of suitable features for speech representation, and the proper preparation of an emotional speech database for evaluating system performance are addressed.
Journal ArticleDOI

Speech recognition by machines and humans

TL;DR: Comparisons suggest that the human-machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech.
Journal ArticleDOI

Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning

TL;DR: Data acquisition, feature extraction and classification methods employed for the analysis of sign language gestures are examined and the overall progress toward a true test of sign recognition systems--dealing with natural signing by native signers is discussed.
References
More filters
Journal ArticleDOI

Maximum likelihood clustering of Gaussians for speech recognition

TL;DR: The authors point out possible applications of model clustering, and then use the approach to determine classes of shared covariances for contest modeling in speech recognition, achieving an order of magnitude reduction in the number of covariance parameters.
Proceedings ArticleDOI

Language modeling with sentence-level mixtures

TL;DR: This paper introduces a simple mixture language model that attempts to capture long distance constraints in a sentence or paragraph using an m-component mixture of trigram models.
Journal ArticleDOI

The importance of cepstral parameter correlations in speech recognition

TL;DR: It is demonstrated that explicit modeling of correlations between spectral parameters in speech recognition improves speech models both in terms of their descriptive power (higher likelihoods) and classification accuracy.
Related Papers (5)