scispace - formally typeset
Open AccessProceedings Article

Speech recognition from GSM codec parameters.

TLDR
It is observed that by selectively combining the cepstral streams representing the LPC parameters and the residual signal it is possible to obtain recognition accuracy directly from the coded parameters that equals or exceeds the recognition accuracy obtained from the reconstructed waveforms.
Abstract
Speech coding affects speech recognition performance, with recognition accuracy deteriorating as the coded bit rate decreases. Virtually all systems that recognize coded speech reconstruct the speech waveform from the coded parameters, and then perform recognition (after possible noise and/or channel compensation) using conventional techniques. In this paper we compare the recognition accuracy of coded speech obtained by reconstructing the speech waveform with the speech recognition accuracy obtained when using cepstral features derived from the coding parameters. We focus our efforts on speech that has been coded using the 13-kbps full-rate GSM codec, a Regular Pulse Excited Long Term Prediction (RPE-LTP) codec. The GSM codec develops separate representations for the linear prediction (LPC) filter and the residual signal components of the coded speech. We measure the effects of quantization and coding on the accuracy with which these parameters are represented, and present two different methods for recombining them for speech recognition purposes. We observe that by selectively combining the cepstral streams representing the LPC parameters and the residual signal it is possible to obtain recognition accuracy directly from the coded parameters that equals or exceeds the recognition accuracy obtained from the reconstructed waveforms.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Electronic mobile guides: a survey

TL;DR: This research paper attempts to categorize mobile tourist guides using a detailed set of evaluation criteria in order to extract design principles which can be used by application designers and developers.
Journal ArticleDOI

A bitstream-based front-end for wireless speech recognition on IS-136 communications system

TL;DR: The proposed bitstream-based front-end gives superior word and string accuracies over a recognizer constructed from decoded speech signals and its performance is comparable to that of a wireline recognition system that uses the cepstrum as a feature set.
Journal ArticleDOI

Graceful degradation of speech recognition performance over packet-erasure networks

TL;DR: This paper explores packet loss recovery for automatic speech recognition (ASR) in spoken dialog systems, assuming an architecture in which a lightweight client communicates with a remote ASR server, and shows that the approach provides robust ASR performance which degrades gracefully as packet loss rates increase.

Speech recognition in mobile environments

TL;DR: It is shown in this work that by selectively constructing a cepstral feature vector from the GSM codec parameters it is possible to reduce the effect of coding on recognition, and weighted acoustic modeling is introduced as an alternative to the method based on average distortion information.
Journal ArticleDOI

Automatic speech recognition over error-prone wireless networks☆

TL;DR: The frame-error-rate is used to adjust the discrimination threshold with the goal of optimising out-of-vocabulary detection and a discussion of applicability of different techniques based on the channel characteristics and the system requirements is concluded.
References
More filters
Book

Fundamentals of speech recognition

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Journal ArticleDOI

Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification

TL;DR: The cepstrum was found to be the most effective, providing an identification accuracy of 70% for speech 50 msec in duration, which increased to more than 98% for a duration of 0.5 sec.
Book

Speech Coding and Synthesis

TL;DR: An introduction to speech coding, W.B. Kleijn evaluation of speech coders, and a robust algorithm for pitch tracking (RAPT), D. McAulay and T.F. Quatieri waveform interpolation for coding and synthesis.
Journal ArticleDOI

Regular-pulse excitation--A novel approach to effective and efficient multipulse coding of speech

TL;DR: Using the generalized baseband coder formulation, it is demonstrated that under reasonable assumptions concerning the weighting filter, an attractive low-complexity/high-quality coder can be obtained.
Proceedings ArticleDOI

Effect of speech coders on speech recognition performance

TL;DR: The results of a study to examine the effects speech coders have on speech recognition are presented and the effects onspeech recognition performance by tandeming each of the speechCoders are presented.
Related Papers (5)