A review of large-vocabulary continuous-speech

doi:10.1109/79.536824

Journal ArticleDOI

A review of large-vocabulary continuous-speech

Steve Young

- 01 Sep 1996 -

IEEE Signal Processing Magazine

- Vol. 13, Iss: 5, pp 45

Chats0

TLDR

The principles and architecture of current LVR systems are discussed and the key issues affecting their future deployment are identified; to illustrate the various points raised, the Cambridge University HTK system is described.

Abstract:

Considerable progress has been made in speech-recognition technology over the last few years and nowhere has this progress been more evident than in the area of large-vocabulary recognition (LVR). Current laboratory systems are capable of transcribing continuous speech from any speaker with average word-error rates between 5% and 10%. If speaker adaptation is allowed, then after 2 or 3 minutes of speech, the error rate will drop well below 5% for most speakers. LVR systems had been limited to dictation applications since the systems were speaker dependent and required words to be spoken with a short pause between them. However, the capability to recognize natural continuous-speech input from any speaker opens up many more applications. As a result, LVR technology appears to be on the brink of widespread deployment across a range of information technology (IT) systems. This article discusses the principles and architecture of current LVR systems and identifies the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system is described. This system is a modem design that gives state-of-the-art performance, and it is typical of the current generation of recognition systems.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton, +10 more

- 18 Oct 2012 -

IEEE Signal Processing Magazine

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

Journal ArticleDOI

Bidirectional recurrent neural networks

Mike Schuster, +1 more

- 01 Nov 1997 -

IEEE Transactions on Signal Processing

TL;DR: It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.

...read moreread less

Journal ArticleDOI

Survey on speech emotion recognition: Features, classification schemes, and databases

Moataz M. H. El Ayadi, +2 more

- 01 Mar 2011 -

Pattern Recognition

TL;DR: A survey of speech emotion classification addressing three important aspects of the design of a speech emotion recognition system, the choice of suitable features for speech representation, and the proper preparation of an emotional speech database for evaluating system performance are addressed.

...read moreread less

Journal ArticleDOI

Speech recognition by machines and humans

Richard P. Lippmann

- 01 Jul 1997 -

Speech Communication

TL;DR: Comparisons suggest that the human-machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech.

...read moreread less

Journal ArticleDOI

Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning

Sylvie C. W. Ong, +1 more

- 01 Jun 2005 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Data acquisition, feature extraction and classification methods employed for the analysis of sign language gestures are examined and the overall progress toward a true test of sign recognition systems--dealing with natural signing by native signers is discussed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Maximum likelihood clustering of Gaussians for speech recognition

A. Kannan, +2 more

- 01 Jul 1994 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: The authors point out possible applications of model clustering, and then use the approach to determine classes of shared covariances for contest modeling in speech recognition, achieving an order of magnitude reduction in the number of covariance parameters.

...read moreread less

Proceedings ArticleDOI

Multivariate-Gaussian-based cepstral normalization for robust speech recognition

Pedro J. Moreno, +3 more

TL;DR: "Blind"

...read moreread less

Proceedings ArticleDOI

Language modeling with sentence-level mixtures

Rukmini Iyer, +2 more

TL;DR: This paper introduces a simple mixture language model that attempts to capture long distance constraints in a sentence or paragraph using an m-component mixture of trigram models.

...read moreread less

Proceedings Article

The AT&t 60,000 word speech-to-text system.

Michael A. Riley, +3 more

Journal ArticleDOI

The importance of cepstral parameter correlations in speech recognition

Andrej Ljolje

- 01 Jul 1994 -

Computer Speech & Language

TL;DR: It is demonstrated that explicit modeling of correlations between spectral parameters in speech recognition improves speech models both in terms of their descriptive power (higher likelihoods) and classification accuracy.

...read moreread less

Collapse

IEEE Transactions on Acoustics, Speech, ...

Perceptual linear predictive (PLP) analysis of speech

Hynek Hermansky

- 01 Apr 1990 -

Journal of the Acoustical Society of Ame...

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton, +10 more

- 18 Oct 2012 -

IEEE Signal Processing Magazine

A review of large-vocabulary continuous-speech

Citations

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Bidirectional recurrent neural networks

Survey on speech emotion recognition: Features, classification schemes, and databases

Speech recognition by machines and humans

Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning

References

Maximum likelihood clustering of Gaussians for speech recognition

Multivariate-Gaussian-based cepstral normalization for robust speech recognition

Language modeling with sentence-level mixtures

The AT&t 60,000 word speech-to-text system.

The importance of cepstral parameter correlations in speech recognition

Related Papers (5)

A tutorial on hidden Markov models and selected applications in speech recognition

Fundamentals of speech recognition

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Perceptual linear predictive (PLP) analysis of speech

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups