scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Acoustics, Speech, and Signal Processing in 1980"


Journal ArticleDOI
TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.
Abstract: Several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary included many phonetically similar monosyllabic words, therefore the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations. For each parameter set (based on a mel-frequency cepstrum, a linear frequency cepstrum, a linear prediction cepstrum, a linear prediction spectrum, or a set of reflection coefficients), word templates were generated using an efficient dynamic warping method, and test data were time registered with the templates. A set of ten mel-frequency cepstrum coefficients computed every 6.4 ms resulted in the best performance, namely 96.5 percent and 95.0 percent recognition with each of two speakers. The superior performance of the mel-frequency cepstrum coefficients may be attributed to the fact that they better represent the perceptually relevant aspects of the short-term speech spectrum.

4,822 citations


Journal ArticleDOI
TL;DR: In this paper, a spectral decomposition of a frame of noisy speech is used to attenuate a particular spectral line depending on how much the measured speech plus noise power exceeds an estimate of the background noise.
Abstract: One way of enhancing speech in an additive acoustic noise environment is to perform a spectral decomposition of a frame of noisy speech and to attenuate a particular spectral line depending on how much the measured speech plus noise power exceeds an estimate of the background noise. Using a two-state model for the speech event (speech absent or speech present) and using the maximum likelihood estimator of the magnitude of the speech spectrum results in a new class of suppression curves which permits a tradeoff of noise suppression against speech distortion. The algorithm has been implemented in real time in the time domain, exploiting the structure of the channel vocoder. Extensive testing has shown that the noise can be made imperceptible by proper choice of the suppression factor.

854 citations


Journal ArticleDOI
TL;DR: The vector quantizing approach is shown to be a mathematically and computationally tractable method which builds upon knowledge obtained in linear prediction analysis studies and is introduced in a nonrigorous form.
Abstract: With rare exception, all presently available narrow-band speech coding systems implement scalar quantization (independent quantization) of the transmission parameters (such as reflection coefficients or transformed reflection coefficients in LPC systems). This paper presents a new approach called vector quantization. For very low data rates, realistic experiments have shown that vector quantization can achieve a given level of average distortion with 15 to 20 fewer bits/frame than that required for the optimized scalar quantizing approaches presently in use. The vector quantizing approach is shown to be a mathematically and computationally tractable method which builds upon knowledge obtained in linear prediction analysis studies. This paper introduces the theory in a nonrigorous form, along with practical results to date and an extensive list of research topics for this new area of speech coding.

754 citations


Journal ArticleDOI
TL;DR: The results suggest a new approach to dynamic time warping for isolated words in which both the reference and test patterns are linearly warped to a fixed length, and then a simplified dynamic time Warping algorithm is used to handle the nonlinear component of the time alignment.
Abstract: The technique of dynamic programming for the time registration of a reference and a test pattern has found widespread use in the area of isolated word recognition. Recently, a number of variations on the basic time warping algorithm have been proposed by Sakoe and Chiba, and Rabiner, Rosenberg, and Levinson. These algorithms all assume that the test input is the time pattern of a feature vector from an isolated word whose endpoints are known (at least approximately). The major differences in the methods are the global path constraints (i.e., the region of possible warping paths), the local continuity constraints on the path, and the distance weighting and normalization used to give the overall minimum distance. The purpose of this investigation is to study the effects of such variations on the performance of different dynamic time warping algorithms for a realistic speech database. The performance measures that were used include: speed of operation, memory requirements, and recognition accuracy. The results show that both axis orientation and relative length of the reference and the test patterns are important factors in recognition accuracy. Our results suggest a new approach to dynamic time warping for isolated words in which both the reference and test patterns are linearly warped to a fixed length, and then a simplified dynamic time warping algorithm is used to handle the nonlinear component of the time alignment. Results with this new algorithm show performance comparable to or better than that of all other dynamic time warping algorithms that were studied.

618 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed a representation for discrete-time signals and systems based on short-time Fourier analysis and showed that a class of linear-filtering problems can be represented as the product of the time-varying frequency response of the filter multiplied by the short time Fourier transform of the input signal.
Abstract: This paper develops a representation for discrete-time signals and systems based on short-time Fourier analysis. The short-time Fourier transform and the time-varying frequency response are reviewed as representations for signals and linear time-varying systems. The problems of representing a signal by its short-time Fourier transform and synthesizing a signal from its transform are considered. A new synthesis equation is introduced that is sufficiently general to describe apparently different synthesis methods reported in the literature. It is shown that a class of linear-filtering problems can be represented as the product of the time-varying frequency response of the filter multiplied by the short-time Fourier transform of the input signal. The representation of a signal by samples of its short-time Fourier transform is applied to the linear filtering problem. This representation is of practical significance because there exists a computationally efficient algorithm for implementing such systems. Finally, the methods of fast convolution age considered as special cases of this representation.

600 citations


Journal ArticleDOI
TL;DR: In this article, a set of conditions under which a sequence is uniquely specified by the phase or samples of the phase of its Fourier transform was developed. But these conditions are distinctly different from the minimum or maximum phase conditions, and are applicable to both one-dimensional and multidimensional sequences.
Abstract: In this paper, we develop a set of conditions under which a sequence is uniquely specified by the phase or samples of the phase of its Fourier transform, and a similar set of conditions under which a sequence is uniquely specified by the magnitude of its Fourier transform. These conditions are distinctly different from the minimum or maximum phase conditions, and are applicable to both one-dimensional and multidimensional sequences. Under the specified conditions, we also develop several algorithms which may be used to reconstruct a sequence from its phase or magnitude.

439 citations


Journal ArticleDOI
L. Marple1
TL;DR: A new recursive algorithm for autoregressive (AR) spectral estimation is introduced, based on the least squares solution for the AR parameters using forward and backward linear prediction, comparable to that of the popular Burg algorithm.
Abstract: A new recursive algorithm for autoregressive (AR) spectral estimation is introduced, based on the least squares solution for the AR parameters using forward and backward linear prediction. The algorithm has computational complexity proportional to the process order squared, comparable to that of the popular Burg algorithm. The computational efficiency is obtained by exploiting the structure of the least squares normal matrix equation, which may be decomposed into products of Toeplitz matrices. AR spectra generated by the new algorithm have improved performance over AR spectra generated by the Burg algorithm. These improvements include less bias in the frequency estimate of spectral components, reduced variance in frequency estimates over an ensemble of spectra, and absence of observed spectral line splitting.

434 citations


Journal ArticleDOI
TL;DR: It is argued that the Itakura-Saito and related distortions are well-suited computationally, mathematically, and intuitively for such applications.
Abstract: Several properties, interrelations, and interpretations are developed for various speech spectral distortion measures. The principle results are 1) the development of notions of relative strength and equivalence of the various distortion measures both in a mathematical sense corresponding to subjective equivalence and in a coding sense when used in minimum distortion or nearest neighbor speech processing systems; 2) the demonstration that the Itakura-Saito and related distortion measures possess a property similar to the triangle inequality when used in nearest neighbor systems such as quantization and cluster analysis; and 3) that the Itakura-Saito and normalized model distortion measures yield efficient computation algorithms for generalized centroids or minimum distortion points of groups or clusters of speech frames, an important computation in both classical cluster analysis techniques and in algorithms for optimal quantizer design. We also argue that the Itakura-Saito and related distortions are well-suited computationally, mathematically, and intuitively for such applications.

409 citations


Journal ArticleDOI
TL;DR: An analysis of this technique is extended to the case when a linear filter appears in the auxiliary signal path and a general solution to this problem is obtained.
Abstract: A technique known as a "multiple correlation cancellation loop" and also as the "LMS algorithm" is widely used in adaptive arrays for radar, sonar, and communications, as well as in many other signal processing applications. In this paper an analysis of this technique is extended to the case when a linear filter appears in the auxiliary signal path. A general solution to this problem is obtained and several examples for narrow-band and broad-band signals are presented.

395 citations


Journal ArticleDOI
H. Teager1
TL;DR: In this paper, a compact array of hot wire anemometers was used to measure intraoral air velocity with a vertical cross section at the rear of the mouth during sustained phonation of the vowel "I".
Abstract: Reproducible intraoral air velocity measurements were made with a compact array of hot wire anemometers moved laterally within a vertical cross section at the rear of the mouth during sustained phonation of the vowel "I". The results indicate "separated" flow patterns at variance with laminar flow of vocal tract vowel models.

353 citations


Journal ArticleDOI
E. Ferrara1
TL;DR: In this paper, a frequency domain implementation of the LMS adaptive transversal filter is proposed, which requires less computation than the conventional LMS filter when the filter length equals or exceeds 64 sample points.
Abstract: A frequency domain implementation of the LMS adaptive transversal filter is proposed. This fast LMS (FLMS) adaptive filter requires less computation than the conventional LMS adaptive filter when the filter length equals or exceeds 64 sample points.

Journal ArticleDOI
John Makhoul1
TL;DR: The discrete cosine transform (DCT) of an N-point real signal is derived by taking the discrete Fourier transform (DFT) of a 2N-point even extension of the signal and the method is extended to two dimensions, with a saving of 1/4 over the traditional method that uses the DFT.
Abstract: The discrete cosine transform (DCT) of an N-point real signal is derived by taking the discrete Fourier transform (DFT) of a 2N-point even extension of the signal. It is shown that the same result may be obtained using only an N-point DFT of a reordered version of the original signal, with a resulting saving of 1/2. If the fast Fourier transform (FFT) is used to compute the DFT, the result is a fast cosine transform (FCT) that can be computed using on the order of N \log_{2} N real multiplications. The method is then extended to two dimensions, with a saving of 1/4 over the traditional method that uses the DFT.

Journal ArticleDOI
R. Crochiere1
TL;DR: A new structure and a simplified interpretation of short-time Fourier synthesis using synthesis windows is presented and it is shown how this structure can be used for analysis/synthesis applications which require different analysis and synthesis rates, such as time compression or expansion.
Abstract: In this correspondence we present a new structure and a simplified interpretation of short-time Fourier synthesis using synthesis windows. We show that this approach can be interpreted as a modification of the overlap-add method where we inverse the Fourier transform and window by the synthesis window prior to overlap-adding. This simplified interpretation results in a more efficient structure for short-time synthesis when a synthesis window is desired. In addition, we show how this structure can be used for analysis/synthesis applications which require different analysis and synthesis rates, such as time compression or expansion.

Journal ArticleDOI
TL;DR: In this article, closed form expressions for main-lobe width, modified main lobe width and relative sidelobe amplitude are given for the I 0 −sinh window function, which facilitate exploring the tradeoff between record length, spectral resolution, and leakage in digital spectrum analysis.
Abstract: Closed form expressions for main-lobe width, modified main-lobe width, and relative sidelobe amplitude are given for the I_{0}- \sinh window function. These formulas facilitate exploring the tradeoff between record length, spectral resolution, and leakage in digital spectrum analysis. An especially simple empirical approximation relating main-lobe width and relative sidelobe amplitude is given.

Journal ArticleDOI
TL;DR: A fast real-time algorithm is presented for median filtering of signals and images that determines the kth bit of the median by inspecting the k most significant bits of the samples.
Abstract: A fast real-time algorithm is presented for median filtering of signals and images. The algorithm determines the kth bit of the median by inspecting the k most significant bits of the samples. The total number of full-word comparison steps is equal to the wordlength of the samples. Speed and hardware complexity of the algorithm is compared with two other fast methods for median filtering.

Journal ArticleDOI
TL;DR: A class of adaptive algorithms designed for use with IIR digital filters which offer a much reduced computational load for basically the same performance, and have their basis in the theory of hyperstability, which yields HARF, a hyperstable adaptive recursive filtering algorithm which has provable convergence properties.
Abstract: The concept of adaptation in digital filtering has proven to be a powerful and versatile means of signal processing in applications where precise a priori filter design is impractical. Adaptive filters have traditionally been implemented with FIR structures, making their analysis fairly straightforward but leading to high computation cost in many cases of practical interest (e.g, sinusoid enhancement). This paper introduces a class of adaptive algorithms designed for use with IIR digital filters which offer a much reduced computational load for basically the same performance. These algorithms have their basis in the theory of hyperstability, a concept historically associated with the analysis of closed-loop nonlinear time-varying control systems. Exploiting this theory yields HARF, a hyperstable adaptive recursive filtering algorithm which has provable convergence properties. A simplified version of the algorithm, called SHARF, is then developed which retains provable convergence at low convergence rates and is well suited to real-time applications. In this paper both HARF and SHARF are described and some background into the meaning and utility of hyperstability is given, in addition, computer simulations are presented for two practical applications of IIR adaptive filters: noise and multi-path cancellation.

Journal ArticleDOI
J. Cadzow1
TL;DR: In this article, a method for generating an ARMA model spectral estimate of a wide-sense stationary time series from a finite set of observations is presented, which is based upon a set of error equations which are dependent on the model's parameters.
Abstract: In this paper a method for generating an ARMA model spectral estimate of a wide-sense stationary time series from a finite set of observations is presented. The method is based upon a set of error equations which are dependent on the ARMA model's parameters. Minimization of a quadratic functional of these error equations with respect to the ARMA model's parameters produces the desired spectral estimate. In examples treated to date, this ARMA spectral estimator has provided significantly better performance when compared to such standard procedures as the maximum entropy and Box-Jenkins methods. The computational requirements of this new method basically entail the solving of a system of p linear equations in the autoregressive coefficients where p denotes the order of the ARMA model. Since an ARMA model will typically be of lower order than its autoregressive model counterpart for a specified fidelity of match, the new ARMA procedure is generally more efficient computationally than the maximum entropy method. With this in mind, this ARMA method offers the promise of being a primary tool in many spectral estimation applications.

Journal ArticleDOI
TL;DR: The application of a general-purpose integer-programming computer program to the design of optimal finite wordlength FIR digital filters is described and an analysis of the approach based on the results of more than 50 design cases is presented.
Abstract: The application of a general-purpose integer-programming computer program to the design of optimal finite wordlength FIR digital filters is described. Examples of two optimal low-pass FIR finite wordlength filters are given and the results are compared with the results obtained by rounding the infinite wordlength coefficients. An analysis of the approach based on the results of more than 50 design cases is presented and the problem of optimal wordlength choice is discussed.

Journal ArticleDOI
Steven Kay1
TL;DR: In this paper, a noise compensation technique was proposed to correct the estimated reflection coefficients for the effect of white noise, assuming the noise variance is known or can be estimated, and simulation results indicate that a significant decrease in the degrading effects of noise may be realized using the noise compensation method.
Abstract: The autoregressive spectral estimator possesses excellent resolution properties for time series which satisfy the "all-pole" assumption. When noise is added to the time series under analysis, the resolution of the spectral estimator decreases rapidly as the signal-to-noise ratio decreases. The usual approach to this problem is to model the resulting time series by the more appropriate autoregressive-moving average process and to use standard time series analysis techniques to identify the autoregressive parameters. This standard technique, however, does not result in a positive-definite autocorrelation matrix. As a result, it is shown that the resulting spectral estimator may exhibit a large increase in variance. An alternative approach, termed the noise compensation technique, is proposed. It attempts to correct the estimated reflection coefficients for the effect of white noise, assuming the noise variance is known or can be estimated. Simulation results indicate that a significant decrease in the degrading effects of noise may be realized using the noise compensation technique.

Journal ArticleDOI
TL;DR: Two approaches to adaptive noise cancellation are compared to reduce ambient noise power by at least 20 dB with minimal speech distortion and thus to be potentially powerful as noise suppression preprocessors for voice communication in severe noise environments.
Abstract: Acoustic noise with energy greater or equal to the speech can be suppressed by adaptively filtering a separately recorded correlated version of the noise signal and subtracting it from the speech waveform. It is shown that for this application of adaptive noise cancellation, large filter lengths are required to account for a highly reverberant recording environment and that there is a direct relation between filter misadjustment and induced echo in the output speech. The second reference noise signal is adaptively filtered using the least mean squares, LMS, and the lattice gradient algorithms. These two approaches are compared in terms of degree of noise power reduction, algorithm convergence time, and degree of speech enhancement. Both methods were shown to reduce ambient noise power by at least 20 dB with minimal speech distortion and thus to be potentially powerful as noise suppression preprocessors for voice communication in severe noise environments.

Journal ArticleDOI
TL;DR: In this paper, the authors present new discrete Fourier transform methods which are recursive, expressible in state variable form, and involve real number computations, which are especially useful for running Fourier transformation and for general and multirate sampling.
Abstract: This paper presents new discrete Fourier transform methods which are recursive, expressible in state variable form, and which involve real number computations. The algorithms are especially useful for running Fourier transformation and for general and multirate sampling. Numerical examples are given which illustrate the ability of these spectral observers to operate at sampling rates other than the Nyquist rate, to perform one-step-per-sample updating, and to converge to the spectrum in the presence of severe numerical truncation error.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the least square estimation of the filter coefficients is equivalent to estimating the Roth processor, and that the parameter estimation approach is expected to have a smaller variance since it avoids the need for spectra estimation.
Abstract: Present techniques that estimate the difference in arrival time between two signals corrupted by noise, received at two separate sensors, are based on the determination of the peak of the generalized cross correlation between the signals. To achieve good resolution and stability in the estimates, the input sequences are first weighted. Invariably, the weights are dependent on input spectra which are generally unknown and hence have to be estimated. By approximating the time shift as a finite impulse response filter, estimation of time delay becomes one of determination of the filter coefficients. With this formulation, a host of techniques in the well-developed area of parameter estimation is available to the time-delay estimation problem-with the possibilities of reduced computation time as compared with present methods. In particular, it is shown that the least squares estimation of the filter coefficients is equivalent to estimating the Roth processor. However, the parameter estimation approach is expected to have a smaller variance since it avoids the need for spectra estimation. Indeed, experimental results from two examples show that the Roth processor, found by least squares parameter estimation, has a smaller variance than the approximate maximum likelihood estimator of Hannan-Thomson where spectral estimation is required. A detector that uses the sum of the estimated parameters as a test statistic is also given, together with its receiver operating characteristics.

Journal ArticleDOI
TL;DR: In this paper, it is shown that the class of 2D minimum mean-square linear prediction error filters with continuous support have the minimum-phase property and the correlation-matching property, and that they can be solved by means of a 2D Levinson algorithm.
Abstract: In this paper, a number of results in one-dimensional (1-D) linear prediction theory are extended to the two-dimensional (2-D) case. It is shown that the class of 2-D minimum mean-square linear prediction error filters with continuous support have the minimum-phase property and the correlation-matching property, and that they can be solved by means of a 2-D Levinson algorithm. A significant practical result to emerge from this theory is a reflection coefficient representation for 2-D minimum-phase filters. This representation provides a domain in which to construct 2-D filters, such that the minimum-phase condition is automatically satisfied.

Journal ArticleDOI
TL;DR: In this article, the ability of a modified covariance method "maximum entropy" spectral estimator to estimate the frequencies of several sinusoids in additive white Gaussian noise is studied.
Abstract: The ability of a modified covariance method "maximum entropy" spectral estimator to estimate the frequencies of several sinusoids in additive white Gaussian noise is studied. Analytical expressions for the variance of the spectral estimate peak positions at high signal-to-noise ratios ate derived. The calculated variance is compared to the Cramer-Rao lower bound and to the results of similar variance calculations for the more familiar covariance method. It is shown that performance approaching the Cramer-Rao bound can be obtained. Simulations demonstrate substantial agreement with the analytical results over a wide range of signal-to-noise ratios.

Journal ArticleDOI
Henri J. Nussbaumer1
TL;DR: It is shown that this method for computing one-dimensional convolutions by polynomial transforms is computationally efficient, even for large convolutions, and can be implemented with FFT-type algorithms, while avoiding the use of trigonometric functions and complex arithmetic.
Abstract: We have recently introduced new transforms, called polynomial transforms, which are defined in rings of polynomials and give efficient algorithms for the computation of multidimensional DFT's and convolutions. In this paper we present a method for computing one-dimensional convolutions by polynomial transforms. We show that this method is computationally efficient, even for large convolutions, and can be implemented with FFT-type algorithms, while avoiding the use of trigonometric functions and complex arithmetic. We then extend this technique to complex convolutions and to multidimensional convolutions.

Journal ArticleDOI
TL;DR: In this article, the authors derive autocorrelations of a chaos arising from a simple nonlinear deterministic difference equation, which are identical with those of a stochastic first-order autoregressive process.
Abstract: We derive autocorrelations of a chaos arising from a simple nonlinear deterministic difference equation. The resulting autocorrelations are identical with those of a stochastic first-order autoregressive process.

Journal ArticleDOI
TL;DR: In this paper, redundant residue number system properties can be used for error detection and correction in recursive digital filters, with a special emphasis on overflow detection, error correction, and gradual system degradation in the presence of recurring errors.
Abstract: In spite of rapid advances during the last few years in the design and realization of digital filters, very little attention has been given to the problems of error detection and correction in digital filters. This paper describes how redundant residue number system properties can be used for this purpose. The theory is presented with special emphasis on overflow detection, error correction, and gradual system degradation in the presence of recurring errors. A filter simulation program is described and simulation results are presented to illustrate the principles in recursive digital filters.

Journal ArticleDOI
TL;DR: The purpose of this correspondence is to introduce an adaptive algorithm for recursive filters, which are implemented via a lattice structure, so that stability can be achieved during the adaptation process.
Abstract: The purpose of this correspondence is to introduce an adaptive algorithm for recursive filters, which are implemented via a lattice structure. The motivation for doing so is that stability can be achieved during the adaptation process. For convenience, the corresponding algorithm is referred to as an "adaptive lattice algorithm" for recursive filters. Results pertaining to using this algorithm in a system-identification experiment are also included.

Journal ArticleDOI
TL;DR: In this paper, the transient behavior of the LMS adaptive filter was studied when configured as an adaptive line enhancer operating in the presence of a fixed or variable complex frequency sine-wave signal buried in white noise.
Abstract: The transient behavior of the LMS adaptive filter is studied when configured as an adaptive line enhancer operating in the presence of a fixed or variable complex frequency sine-wave signal buried in white noise. For a fixed frequency signal, the mean weights are shown to respond to signal more rapidly than to noise alone. For a chirped signal, a fixed parameter matrix first-order difference equation is derived for the mean weights and a closed-form steady-state solution obtained. The transient response is obtained as a function of the eigenvectors and eigenvalues of the input covariance matrix. Sufficient conditions for the stability of the transient response are derived and an upper bound on the eigenvalues obtained. Finally, the mean-square error is evaluated when responding to a chirped signal. A gain coefficient of the LMS algorithm is determined which minimizes the mean-square error for chirped signals as a function of chirp rate and signal and noise powers.

Journal ArticleDOI
TL;DR: A connected digit recognizer is proposed in which a set of isolated word templates is used as reference patterns and an unconstrained dynamic time warping algorithm is used to literally "spot" the digits in the string.
Abstract: A connected digit recognizer is proposed in which a set of isolated word templates is used as reference patterns and an unconstrained dynamic time warping (DTW) algorithm is used to literally "spot" the digits in the string. Segmentation boundaries between digits are obtained as the termination point of the dynamic path from the previous time warp. A region around the boundary is searched for the optimum starting point for the succeeding digit. At each stage the recognizer keeps track of a set of candidate digit strings for each test string. The string with the smallest accumulated distance is used as the preliminary string estimate. To help improve the recognition accuracy, two "post-correction" techniques were applied to the entire set of hypothesized digit strings. One technique creates a reference string by concatenating reference contours of the digits of the string, and comparing this to the test string using a constrained dynamic time warping algorithm. The second technique performs a similar comparison using voiced-unvoiced-silence contours instead of the measured features. Small but consistent improvements in recognition accuracy have been obtained using these techniques for both speaker-trained and speaker-independent systems with digit strings recorded over dialed-up telephone lines. For variable length digit strings of from 2 to 5 digits (where the recognizer was not told the length of the string), word error rates of about 2-3 percent and string error rates on the order of 8 percent were obtained for both speaker-dependent and speaker-independent systems.