scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Acoustics, Speech, and Signal Processing in 1978"


Journal ArticleDOI
H. Sakoe1, S. Chiba1
TL;DR: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition, in which the warping function slope is restricted so as to improve discrimination between words in different categories.
Abstract: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition. First, a general principle of time-normalization is given using time-warping function. Then, two time-normalized distance definitions, called symmetric and asymmetric forms, are derived from the principle. These two forms are compared with each other through theoretical discussions and experimental studies. The symmetric form algorithm superiority is established. A new technique, called slope constraint, is successfully introduced, in which the warping function slope is restricted so as to improve discrimination between words in different categories. The effective slope constraint characteristic is qualitatively analyzed, and the optimum slope constraint condition is determined through experiments. The optimized algorithm is then extensively subjected to experimental comparison with various DP-algorithms, previously applied to spoken word recognition by different research groups. The experiment shows that the present algorithm gives no more than about two-thirds errors, even compared to the best conventional algorithm.

5,906 citations


Journal ArticleDOI
Hsieh Hou1, H. Andrews
TL;DR: Applications to image and signal processing include interpolation, smoothing, filtering, enlargement, and reduction, and experimental results are presented for illustrative purposes in two-dimensional image format.
Abstract: This paper presents the use of B-splines as a tool in various digital signal processing applications. The theory of B-splines is briefly reviewed, followed by discussions on B-spline interpolation and B-spline filtering. Computer implementation using both an efficient software viewpoint and a hardware method are discussed. Finally, experimental results are presented for illustrative purposes in two-dimensional image format. Applications to image and signal processing include interpolation, smoothing, filtering, enlargement, and reduction.

1,293 citations


Journal ArticleDOI
TL;DR: This paper considers the estimation of speech parameters in an all-pole model when the speech has been degraded by additive background noise and develops a procedure based on maximum a posteriori (MAP) estimation techniques which is related to linear prediction analysis of speech.
Abstract: This paper considers the estimation of speech parameters in an all-pole model when the speech has been degraded by additive background noise. The procedure, based on maximum a posteriori (MAP) estimation techniques is first developed in the absence of noise and related to linear prediction analysis of speech. The modification in the presence of background noise is shown to be nonlinear. Two suboptimal procedures are suggested which have linear iterative implementations. A preliminary illustration and discussion based both on a synthetic example and real speech data are given.

590 citations


Journal ArticleDOI
TL;DR: It is shown that, based on a set of assumptions about the distributions of the distances, the warping algorithm that minimizes the overall probability of making a word error is the modified time Warping algorithm with unconstrained endpoints.
Abstract: The technique of dynamic time warping for time registration of a reference and test utterance has found widespread use in the areas of speaker verification and discrete word recognition. As originally proposed, the algorithm placed strong constraints on the possible set of dynamic paths-namely it was assumed that the initial and final frames of both the test and reference utterances were in exact time synchrony. Because of inherent practical difficulties with satisfying the assumptions under which the above constraints are valid, we have considered some modifications to the dynamic time warping algorithm. In particular, an algorithm in which an uncertainty exists in the registration both for initial and final frames was studied. Another modification constrains the dynamic path to follow (within a given range) the path which is locally optimum at each frame. This modification tends to work well when the location of the final frame of the test utterance is significantly in error due to breath noise, etc. To test the different time warping algorithms a set of ten isolated words spoken by 100 speakers was used. Probability density functions of the distances from each of the 100 versions of a word to a reference version of the word were estimated for each of three dynamic warping algorithms. From these data, it is shown that, based on a set of assumptions about the distributions of the distances, the warping algorithm that minimizes the overall probability of making a word error is the modified time warping algorithm with unconstrained endpoints. A discussion of this key result along with some ideas on where the other modifications would be most useful is included.

349 citations


Journal ArticleDOI
TL;DR: The modified moving window method (MMWM) (Kodera et al. [10) is the one which gives the most significant result whatever the value of the filtering bandwidth, and the ability of each method to resolve multicomponent signals is discussed.
Abstract: We compare four different methods for analyzing time-varying signals, the frequency and amplitude of which are both varying (small BT signals, where B is the bandwidth and T the duration). All four methods give results which depend on the frequency bandwidth of the analyzing filter. But the modified moving window method (MMWM) (Kodera et al. [10]) is the one which gives the most significant result whatever the value of the filtering bandwidth. This is demonstrated both by a mathematical treatment and by a numerical simulation. In general, there are two characteristic curves in the frequency-time plane: one which gives the instantaneous frequency as a function of time and the other which gives the group delay time as a function of frequency. For signals with small BT values, these two curves are distinct and the different analyzing methods approach one or the other curve, or neither of them, depending on the bandwidth of the filtering window. The ability of each method to resolve multicomponent signals is discussed. The influence of the scaling factors used for the visualization of the signal in a two-dimensional plane is also studied.

311 citations


Journal ArticleDOI
TL;DR: In this paper, the steady-state behavior of the adaptive line enhancer (ALE) is analyzed for a stationary input consisting of multiple sinusoids in white noise, and it is shown that the expected values of the ALE weights in steady state can be written as a sum of sinusoid and that the amplitude of each susoid is coupled to that of all other susoids by coefficients that approach zero as the number of ALE weights becomes large.
Abstract: The steady-state behavior of the adaptive line enhancer (ALE), a new implementation of adaptive filtering that has application in detecting and tracking narrow-band signals in broad-band noise, is analyzed for a stationary input consisting of multiple sinusoids in white noise. It is shown that the steady-state performance of an L-weight ALE for this case can be modeled by the L × L Wiener-Hopf matrix equation and that this matrix equation can be transformed into a set of 2N coupled linear equations, where N is the number of sinusoids. It is also shown that the expected values of the ALE weights in steady state can be written as a sum of sinusoids and that the amplitude of each sinusoid is coupled to that of all other sinusoids by coefficients that approach zero as the number of ALE weights becomes large. The analytical results are compared to experimental results obtained with a hardware implementation of the ALE of variable length (up to 256 weights) and show good agreement. Theoretical expressions for linear predictive spectral estimates are also derived for multiple sinusoids in white noise. Comparisons are made between the magnitude of the discrete Fourier transform of the ALE weights and the linear predictive spectral estimate for two sinusoids in white noise.

223 citations


Journal ArticleDOI
TL;DR: Preliminary tests indicate that the least mean-square adaptive filtering approach for removing the deleterious effects of additive noise on the speech signal improves the perceived speech quality and increases the signal-to-noise ratio (SNR) by 7 dB in a 0 dB environment.
Abstract: A least mean-square (LMS) adaptive filtering approach has been formulated for removing the deleterious effects of additive noise on the speech signal. Unlike the classical LMS adaptive filtering scheme, the proposed method is designed to cancel out the clean speech signal. This method takes advantage of the quasi-periodic nature of the speech signal to form an estimate of the clean speech signal at time t from the value of the signal at time t minus the estimated pitch period. For additive white noise distortion, preliminary tests indicate that the method improves the perceived speech quality and increases the signal-to-noise ratio (SNR) by 7 dB in a 0 dB environment. The method has also been shown to partially remove the perceived granularity of CVSD coded speech signals and to lead to an improvement in the linear prediction analysis/synthesis of noisy speech.

207 citations


Journal ArticleDOI
TL;DR: A sufficient condition is given for a two's complement state variable realization of any order to be free of overflow oscillation, and a simple characterization of the condition isgiven for second-order filters.
Abstract: Most of the literature dealing with overflow oscillation in fixed-point arithmetic digital filters has considered the direct form exclusively. It is possible to eliminate overflow oscillations, regardless of pole locations, by considering more general forms. A sufficient condition is given for a two's complement state variable realization of any order to be free of overflow oscillation. A simple characterization of the condition is given for second-order filters. Among those second-order forms which meet the condition are normal forms, and forms which minimize output roundoff noise.

205 citations


Journal ArticleDOI
John Makhoul1
TL;DR: In this paper, a class of minimum- or maximum-phase all-zero lattice digital filters, based on the two-multiplier lattice of Itakura and Saito, is developed.
Abstract: A class of minimum- or maximum-phase all-zero lattice digital filters, based on the two-multiplier lattice of Itakura and Saito, is developed. Different lattice forms with different numbers of multipliers are derived, including two one-multiplier forms. Many of the properties of these lattice filters are given, including the important orthogonalization and decoupling properties of successive stages in optimal inverse filtering of signals. These properties lead to important applications in the areas of adaptive linear prediction and adaptive Wiener filtering. As a specific example, the design of a new fast start-up equalizer is presented.

181 citations


Journal ArticleDOI
TL;DR: In this paper, an intelligibility test was performed to evaluate an adaptive comb filtering method proposed by Frazier [2] for enhancement of degraded speech due to additive white noise, and it was shown that independent of S/N ratio the adaptive comb filter scheme does not increase speech intelligibility.
Abstract: An intelligibility test was performed to evaluate an adaptive comb filtering method proposed by Frazier [2] for enhancement of degraded speech due to additive white noise. Results indicate that independent of S/N ratio the adaptive comb filtering scheme does not increase speech intelligibility.

132 citations


Journal ArticleDOI
TL;DR: An intelligibility test was performed to evaluate a correlation subtraction method for enhancement of degraded speech due to additive white noise and results indicate that such a scheme does not significantly increase speech intelligibility at the S/N ratios.
Abstract: An intelligibility test was performed to evaluate a correlation subtraction method for enhancement of degraded speech due to additive white noise. Results indicate that such a scheme does not significantly increase speech intelligibility at the S/N ratios where the intelligibility scores of unprocessed speech range between 20 and 70 percent.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the use of two spatially separated receivers to determine the presence of a distant signal source and its relative bearing and proposed a detection threshold that depends only on the probability of false alarm and not on the ambient noise level.
Abstract: This paper examines the use of two spatially separated receivers to determine the presence of a distant signal source and its relative bearing. Ideally, the phase shift between the receivers' output is proportional to the frequency with the time delay between outputs equal to the proportionality constant. Because of noise, the plot of phase against frequency is scattered along a straight line whose slope is the time delay. A least squares estimator of the slope turns out to be equivalent to the maximum likelihood estimator developed by Hamon and Hannan [1]. Since the goodness of fit of the least squares line is a function of the coherence between the receivers' output, the sum of the squared errors is used as a test statistic in detection. The proposed detector has a detection threshold that depends only on the probability of false alarm and not on the ambient noise level. It can also be simply extended to an array of receivers.

Journal ArticleDOI
TL;DR: In this article, the authors show how discrete Fourier transformation can be implemented as a filter bank in a way which reduces the number of filter coefficients, leading to new forms of FFT's, among which is a \cos/sin FFT for a real signal which only employs real coefficients.
Abstract: The paper shows how discrete Fourier transformation can be implemented as a filter bank in a way which reduces the number of filter coefficients. A particular implementation of such a filter bank is directly related to the normal complex FFT algorithm. The principle developed further leads to types of DFT filter banks which utilize a minimum of complex coefficients. These implementations lead to new forms of FFT's, among which is a \cos/\sin FFT for a real signal which only employs real coefficients. The new FFT algorithms use only half as many real multiplications as does the classical FFT.

Journal ArticleDOI
TL;DR: In this article, a comparison of the well-known and novel windows in terms of their frequency domain properties is given, and it is concluded that Kaiser, modified Kaiser, Tukey, and three-coefficient window families appear to be the best of the known windows of 6, 12, and 18 dB/oct decay rates.
Abstract: Some novel windows are introduced. A comparison of these and the well-known windows in terms of their frequency domain properties is given. It is concluded that Kaiser, modified Kaiser, Tukey, and three-coefficient window families appear to be the best of the known windows of 6, 12, and 18 dB/oct decay rates.

Journal ArticleDOI
TL;DR: In this paper, the spectral smoothing technique (SST), using a lag window, is introduced in an autocorrelation method of the linear predictive analysis of speech, to assess the effectiveness of the SST to reduce estimation errors.
Abstract: In linear predictive analysis of speech, voice periodicity influences formant frequency and bandwidth estimation accuracy. One of the most serious errors in estimating formant parameters is bandwidth underestimation that causes a quality difference between synthetic and natural speech. In this paper, the spectral smoothing technique (SST), using a lag window, is introduced in an autocorrelation method of the linear predictive analysis. In order to assess the effectiveness of the SST to reduce estimation errors, experimental comparisons of the usual autocorrelation method and the SST are presented. Spectral sensitivity analysis is also presented to evaluate the SST from the viewpoint of parameter quantization properties. SST features are summarized as follows: 1) Bandwidth underestimation elimination. 2) Spectral sensitivity reduction of PARCOR coefficients. 3) Simplicity in hardware implementation.

Journal ArticleDOI
A. Jain1
TL;DR: The algorithms of this paper are nonrecursive (as compared to the Levinson-Trench algorithms), and afford parallel processor architectures and others such as transversal filters where the computation time becomes proportional to N rather than to N \log N .
Abstract: Banded Toeplitz matrices of large size occur in many practical problems [1]-[6]. Here the problem of inversion as well as the problem of solving simultaneous equations of the type Hx = y, when H is a large banded Toeplitz matrix, are considered. It is shown via certain circular decompositions of H that such equations may be exactly solved in O(N \log_{2} N) rather than in O(N2) computations as in Levinson-Trench algorithms. Furthermore, the algorithms of this paper are nonrecursive (as compared to the Levinson-Trench algorithms), and afford parallel processor architectures and others such as transversal filters [17] where the computation time becomes proportional to N rather than to N \log N . Finally, a principle of matrix decomposition for fast inversion of matrices is introduced as a generalization of the philosophy of this paper.

Journal ArticleDOI
TL;DR: In this paper, a general stability preserving mapping theorem is presented which allows most recursive filters of a particular type to be mapped into any other type of recursive filter, and a number of practical stability tests are developed including one which requires the testing of several one-dimensional polynomial root distributions with respect to the unit circle.
Abstract: Two-dimensional recursive filters are defined from a different point of view. A general stability preserving mapping theorem is presented which allows most recursive filters of a particular type to be mapped into any other type of recursive filter. In particular, any type of filter can be mapped into a first-quadrant filter. This mapping is used to prove a number of general stability theorems. Among these is a theorem which relates the stability of any digital filter to its two-dimensional phase function. Furthermore, other stability theorems which are valid for any type of recursive filter are presented. Finally, a number of practical stability tests are developed including one which requires the testing of only several one-dimensional polynomial root distributions with respect to the unit circle.

Journal ArticleDOI
H. Martinez1, T. Parks1
TL;DR: In this article, a class of infinite impulse response (IIR) digital filters with optimum magnitude in the Chebyshev sense, arbitrary attenuation in the passband and stopband, all zeros on the unit circle, and different order numerator and denominator is discussed.
Abstract: A class of infinite impulse response (IIR) digital filters with optimum magnitude in the Chebyshev sense, arbitrary attenuation in the passband and stopband, all zeros on the unit circle, and different order numerator and denominator is discussed. Several properties of low-pass filters of this type are described, such as the effect of an extra ripple in the passband and the minimum attainable passband ripple for a given order. An algorithm for the design of these filters is presented, which given the order of the filter, passband edge, stopband edge and passband ripple minimizes the stopband ripple. Alternatively, the stopband ripple can be fixed and the passband ripple minimized. This is done by working with the numerator and denominator separately. This algorithm is fast compared to other existing design procedures. Several examples are presented and compared with the classical elliptic filters. Filters are described which meet the same tolerance scheme as an elliptic filter with fewer multiplications.

Journal ArticleDOI
TL;DR: These local adaptive image processing methods are constructed by sectioning the image and applying a modified MAP restoration algorithm and are shown to be effective in processing nonstationary images.
Abstract: Locally adaptive image processing methods are constructed by sectioning the image and applying a modified MAP restoration algorithm. These local algorithms are shown to be effective in processing nonstationary images. The algorithms can work in both signal-independent and signal-dependent noise. The gains achieved by local and signal-dependent processing are analyzed.

Journal ArticleDOI
TL;DR: In this paper, the authors extended sectional methods in image processing to the processing of degradations produced by space-variant point spread functions and applied them to image segmentation.
Abstract: Previous work on sectional methods in image processing is extended to the processing of degradations produced by space-variant point spread functions.

Journal ArticleDOI
Lawrence R. Rabiner1
TL;DR: A method of combining word patterns from a number of speakers is proposed in which a clustering type of analysis is used to determine which patterns are merged to create a word template.
Abstract: The three aspects of a statistical approach to a pattern recognition problem are the selection of features, choice of a measure of similarity, and a method for creating the reference templates (patterns) used in the statistical tests. This paper discusses a philosophy for creating reference templates for a speaker independent, isolated word recognition system. Although there remain many unanswered questions both about how to select appropriate features for recognition, and how to measure similarity between sets of features, such issues are not discussed here. Instead we concentrate on methods for creating the reference templates. In particular, a method of combining word patterns from a number of speakers is proposed in which a clustering type of analysis is used to determine which patterns are merged to create a word template. The creation of multiple templates, based on this method, is discussed and is shown to be of substantial value for as few as eight speakers in the training set. To test the ideas proposed here, a 54 word vocabulary word recognition system was implemented. All input words were recorded off a standard telephone line. The features used were the LPC coefficients of an 8-pole analysis, and the simple Itakura distance measure was used to measure similarity between patterns. With word templates obtained as described above, recognition accuracies of 85 percent were obtained in a forced choice recognition test on the 54 word vocabulary using eight new speakers. The correct word was within the top five choices 98 percent of the time. Using a strategy in which all the training words were used to create the templates, the recognition accuracy fell to 77 percent, and the correct word was within the top five choices only 89 percent of the time.

Journal ArticleDOI
TL;DR: In this article, a construction is given to obtain first-order equation representations of a multidimensional filter whose dimension is of the order of the degree of the transfer function, where the dimension is fixed.
Abstract: A construction is given to obtain first-order equation representations of a multidimensional filter, whose dimension is of the order of the degree of the transfer function.

Journal ArticleDOI
TL;DR: In this article, a real-time harmonic pitch detection algorithm was developed on the Lincoln Digital Voice Terminal (LDVT), which was designed to be fast and to perform well when the input speech is degraded (i.e., telephone quality) or corrupted with acoustically coupled noise.
Abstract: A real-time harmonic pitch detection algorithm has been developed on the Lincoln Digital Voice Terminal (LDVT). The algorithm was designed to be fast and to perform well when the input speech is degraded (i.e., telephone quality) or corrupted with acoustically coupled noise. The algorithm determines the fundamental frequency from the spacing between harmonics in a selected portion of the spectrum. The algorithm was incorporated into a real-time linear prediction vocoder and compared favorably in informal listening tests with the Gold-Rabiner time-domain detector under a variety of adverse conditions.

Journal ArticleDOI
Steven Kay1
TL;DR: Using maximum entropy power spectral estimation, the estimate of the frequency of a sinusoid in white noise has been shown to be very sensitive to the initial sinusoidal phase as discussed by the authors, which can be reduced by replacing the real data by its analytic form, reducing the sampling rate by two, and employing the power spectral estimate for complex data.
Abstract: Using maximum entropy power spectral estimation, the estimate of the frequency of a sinusoid in white noise has been shown to be very sensitive to the initial sinusoidal phase. This phase dependence can be significantly reduced by replacing the real data by its analytic form, reducing the sampling rate by two, and employing the power spectral estimate for complex data.

Journal ArticleDOI
TL;DR: The performance of three well-known system identification methods based on an FIR (finite impulse response) model of the system are investigated and Quantitative results in terms of an accuracy measure of system identification are presented.
Abstract: System identification, that is, the modeling and identification of a system from knowledge of its input and output signals, is a subject that is of considerable importance in many areas of signal and data processing. Because of the diversity of applications, a number of different methods for system identification with different advantages and disadvantages have been described and used in the literature. In this paper we investigate the performance of three well-known system identification methods based on an FIR (finite impulse response) model of the system. The methods will be referred to in this paper as the least squares analysis (LSA) method, the least mean squares adaptation algorithm (LMS), and the short-time spectral analysis (SSA) procedure. Our particular interest in this paper concerns the performance of these algorithms in the presence of high noise levels and in situations where the input signal may be band-limited. Both white and nonwhite random noise signals as well as speech signals are used as test signals to measure the performance of each of the system identification techniques as a function of the signal-to-noise ratio of the systems output. Quantitative results in terms of an accuracy measure of system identification are presented and a simple analytical model is used to explain the measured results.

Journal ArticleDOI
TL;DR: In this paper, a spectral transformation from the one-dimensional discrete domain into the 2D discrete domain is proposed, which retains the advantages of the original technique while permitting design entirely in the discrete domain, yielding filters with better stability characteristics, and facilitating frequency response optimization via nonlinear programming.
Abstract: The design of two-dimensional (2-D) circularly-symmetric low-pass digital filters by cascading several rotated filters (a rotated filter is defined to be one produced by rotating a one-dimensional (1-D) continuous filter into a two-dimensional continuous filter which is in turn bilinearly transformed into a two-dimensional digital filter) is a well-known and useful technique. An alternate approach which is an extension of the above technique is presented. This new method is based on a spectral transformation from the one-dimensional discrete domain into the two-dimensional discrete domain. This approach retains most of the advantages of the original technique while permitting design entirely in the discrete domain, yielding filters with better stability characteristics, and facilitating frequency response optimization via nonlinear programming.

Journal ArticleDOI
TL;DR: A radix-3 FFT which has no multiplications in the three-point DFT's is introduced and the application to fast convolution of real sequences is discussed.
Abstract: A radix-3 FFT which has no multiplications in the three-point DFT's is introduced. It uses arithmetic with numbers of the form a + bμ, where μ is a complex cube root of unity. The application to fast convolution of real sequences is discussed.

Journal ArticleDOI
TL;DR: In this article, an inverse digital filter in cascade form was proposed to estimate pole locations in the system model to be readily estimated and constrained, and the adaptive solution of the corresponding nonlinear normal equations was described.
Abstract: The autocorrelation and covariance methods of linear prediction are formulated in terms of an inverse digital filter in cascade form, rather than the traditional direct form, to allow pole locations in the system model to be readily estimated and constrained. Iterative solution of the corresponding nonlinear normal equations is described. Applications to speech analysis and the compensation of biomedical signals are briefly discussed.

Journal ArticleDOI
TL;DR: In an algorithm proposed here, DFT coefficients are computed via the Walsh transform (WT), which is superior to the fast Fourier transform (FFT) approach in applications where L is relatively small compared with N.
Abstract: This paper presents a new computational algorithm for the discrete Fourier transform (DFT). In an algorithm proposed here, DFT coefficients are computed via the Walsh transform (WT). The number of multiplications required by the new algorithm is approximately NL/6, where N is the number of data points and L is the number of Fourier coefficients desired. As such, it is superior to the fast Fourier transform (FFT) approach in applications where L is relatively small compared with N. It is also useful in cases where the Walsh and Fourier coefficients are both desired.

Journal ArticleDOI
Charles C. Tappert1, Subhro Das1
TL;DR: Empirical results indicate that one method yields 50 to 60 percent storage reduction and a factor of 4 to 6 in computational savings relative to conventional dynamic programming procedures without degradation in recognition accuracy.
Abstract: Recently, dynamic programming has been found useful for performing nonlinear time warping in speech recognition. Although considerably faster than exhaustive search procedures, the dynamic programming procedure nevertheless requires substantial computation. Also, considerable storage is normally required for reference prototypes necessary in the matching process. This paper is concerned with methods for reducing this storage and computation. Empirical results indicate that one method yields 50 to 60 percent storage reduction and a factor of 4 to 6 in computational savings relative to conventional dynamic programming procedures without degradation in recognition accuracy.