scispace - formally typeset
Journal ArticleDOI

On the effects of varying filter bank parameters on isolated word recognition

TLDR
Results of performance evaluation of several types of filter bank analyzers in a speaker trained isolated word recognition test using dialed-up telephone line recordings indicate that the best performance is obtained by both a 15-channel uniform filter bank and a 13-channel nonuniform filter bank.
Abstract
The vast majority of commercially available isolated word recognizers use a filter bank analysis as the front end processing for recognition. It is not well understood how the parameters of different filter banks (e.g., number of filters, types of filters, filter spacing, etc.) affect recognizer performance. In this paper we present results of performance evaluation of several types of filter bank analyzers in a speaker trained isolated word recognition test using dialed-up telephone line recordings. We have studied both DFT (discrete Fourier transform) and direct form implementations of the filter banks. We have also considered uniform and nonuniform filter spacings. The results indicate that the best performance (highest word accuracy) is obtained by both a 15-channel uniform filter bank and a 13-channel nonuniform filter bank (with channels spacing along a critical band scale). The performance of a 7-channel critical band filter bank is almost as good as that of the two best filter banks. In comparison to a conventional linear predictive coding (LPC) word recognizer, the performance of the best filter bank recognizers was, on average, several percent worse than that of an eighth-order LPC-based recognizer. A discussion as to why some filter banks performed better than others, and why the LPC-based system did the best, is given in this paper.

read more

Citations
More filters
Journal ArticleDOI

Hidden Markov models for speech recognition

TL;DR: The role of statistical methods in this powerful technology as applied to speech recognition is addressed and a range of theoretical and practical issues that are as yet unsolved in terms of their importance and their effect on performance for different system implementations are discussed.
Journal ArticleDOI

Signal modeling techniques in speech recognition

TL;DR: A tutorial on signal processing in state-of-the-art speech recognition systems is presented, reviewing those techniques most commonly used, and three important trends that have developed in the last five years in speech recognition are examined.
Journal ArticleDOI

Speech recognition in noisy environments: a survey

TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.
Journal ArticleDOI

Extraction of visual features for lipreading

TL;DR: Three methods for parameterizing lip image sequences for recognition using hidden Markov models are compared and two are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape or shape and appearance, respectively.
Journal ArticleDOI

On the use of bandpass liftering in speech recognition

TL;DR: This paper has found that a bandpass "liftering" process reduces the variability of the statistical components of LPC-based spectral measurements and hence it is desirable to use such a liftering process in a speech recognizer.
References
More filters
Book

Theory and application of digital signal processing

TL;DR: Feyman and Wing as discussed by the authors introduced the simplicity of the invariant imbedding method to tackle various problems of interest to engineers, physicists, applied mathematicians, and numerical analysts.
Journal ArticleDOI

Minimum prediction residual principle applied to speech recognition

TL;DR: A computer system is described in which isolated words, spoken by a designated talker, are recognized through calculation of a minimum prediction residual through optimally registering the reference LPC onto the input autocorrelation coefficients using the dynamic programming algorithm.
Book

Speech Analysis, Synthesis and Perception

TL;DR: A second edition was begun in 1970, the aim was to retain the original format, but to expand the content, especially in the areas of digital communications and com puter techniques for speech signal processing.
Journal ArticleDOI

Distance measures for speech processing

TL;DR: The likelihood ratio, cepstral measure, and cosh measure are easily evaluated recursively from linear prediction filter coefficients, and each has a meaningful and interrelated frequency domain interpretation.
Related Papers (5)