Journal ArticleDOI
On the effects of varying filter bank parameters on isolated word recognition
TLDR
Results of performance evaluation of several types of filter bank analyzers in a speaker trained isolated word recognition test using dialed-up telephone line recordings indicate that the best performance is obtained by both a 15-channel uniform filter bank and a 13-channel nonuniform filter bank.Abstract:
The vast majority of commercially available isolated word recognizers use a filter bank analysis as the front end processing for recognition. It is not well understood how the parameters of different filter banks (e.g., number of filters, types of filters, filter spacing, etc.) affect recognizer performance. In this paper we present results of performance evaluation of several types of filter bank analyzers in a speaker trained isolated word recognition test using dialed-up telephone line recordings. We have studied both DFT (discrete Fourier transform) and direct form implementations of the filter banks. We have also considered uniform and nonuniform filter spacings. The results indicate that the best performance (highest word accuracy) is obtained by both a 15-channel uniform filter bank and a 13-channel nonuniform filter bank (with channels spacing along a critical band scale). The performance of a 7-channel critical band filter bank is almost as good as that of the two best filter banks. In comparison to a conventional linear predictive coding (LPC) word recognizer, the performance of the best filter bank recognizers was, on average, several percent worse than that of an eighth-order LPC-based recognizer. A discussion as to why some filter banks performed better than others, and why the LPC-based system did the best, is given in this paper.read more
Citations
More filters
Journal ArticleDOI
Hidden Markov models for speech recognition
TL;DR: The role of statistical methods in this powerful technology as applied to speech recognition is addressed and a range of theoretical and practical issues that are as yet unsolved in terms of their importance and their effect on performance for different system implementations are discussed.
Journal ArticleDOI
Signal modeling techniques in speech recognition
TL;DR: A tutorial on signal processing in state-of-the-art speech recognition systems is presented, reviewing those techniques most commonly used, and three important trends that have developed in the last five years in speech recognition are examined.
Journal ArticleDOI
Speech recognition in noisy environments: a survey
TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.
Journal ArticleDOI
Extraction of visual features for lipreading
TL;DR: Three methods for parameterizing lip image sequences for recognition using hidden Markov models are compared and two are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape or shape and appearance, respectively.
Journal ArticleDOI
On the use of bandpass liftering in speech recognition
TL;DR: This paper has found that a bandpass "liftering" process reduces the variability of the statistical components of LPC-based spectral measurements and hence it is desirable to use such a liftering process in a speech recognizer.
References
More filters
Book
Theory and application of digital signal processing
TL;DR: Feyman and Wing as discussed by the authors introduced the simplicity of the invariant imbedding method to tackle various problems of interest to engineers, physicists, applied mathematicians, and numerical analysts.
Journal ArticleDOI
Minimum prediction residual principle applied to speech recognition
TL;DR: A computer system is described in which isolated words, spoken by a designated talker, are recognized through calculation of a minimum prediction residual through optimally registering the reference LPC onto the input autocorrelation coefficients using the dynamic programming algorithm.
Book
Speech Analysis, Synthesis and Perception
TL;DR: A second edition was begun in 1970, the aim was to retain the original format, but to expand the content, especially in the areas of digital communications and com puter techniques for speech signal processing.
Journal ArticleDOI
Distance measures for speech processing
TL;DR: The likelihood ratio, cepstral measure, and cosh measure are easily evaluated recursively from linear prediction filter coefficients, and each has a meaningful and interrelated frequency domain interpretation.
Related Papers (5)
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
S. Davis,Paul Mermelstein +1 more