scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Acoustics, Speech, and Signal Processing in 1986"


Journal ArticleDOI
TL;DR: A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves, which forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding.
Abstract: A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated from the short-time Fourier transform using a simple peak-picking algorithm. Rapid changes in the highly resolved spectral components are tracked using the concept of "birth" and "death" of the underlying sine waves. For a given frequency track a cubic function is used to unwrap and interpolate the phase such that the phase track is maximally smooth. This phase function is applied to a sine-wave generator, which is amplitude modulated and added to the other sine waves to give the final speech output. The resulting synthetic waveform preserves the general waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in the presence of noise the perceptual characteristics of the speech as well as the noise are maintained. In addition, it was found that the representation was sufficiently general that high-quality reproduction was obtained for a larger class of inputs including: two overlapping, superposed speech waveforms; music waveforms; speech in musical backgrounds; and certain marine biologic sounds. Finally, the analysis/synthesis system forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding [8], [9].

1,659 citations


Journal ArticleDOI
TL;DR: A simple yet efficient extension of this concept to the source coding of images by specifying the constraints for a set of two-dimensional quadrature mirror filters for a particular frequency-domain partition and showing that these constraints are satisfied by a separable combination of one-dimensional QMF's.
Abstract: Subband coding has become quite popular for the source encoding of speech. This paper presents a simple yet efficient extension of this concept to the source coding of images. We specify the constraints for a set of two-dimensional quadrature mirror filters (QMF's) for a particular frequency-domain partition, and show that these constraints are satisfied by a separable combination of one-dimensional QMF's. Bits are then optimally allocated among the subbands to minimize the mean-squared error for DPCM coding of the subbands. Also, an adaptive technique is developed to allocate the bits within each subband by means of a local variance mask. Optimum quantization is employed with quantizers matched to the Laplacian distribution. Subband coded images are presented along with their signal-to-noise ratios (SNR's). The SNR performance of the subband coder is compared to that of the adaptive discrete cosine transform (DCT), vector quantization, and differential vector quantization for bit rates of 0.67, 1.0, and 2.0 bits per pixel for 256 × 256 monochrome images. The adaptive subband coder has the best SNR performance.

1,181 citations


Journal ArticleDOI
TL;DR: The application of a subspace invariance approach (ESPRIT) to the estimation of parameters (frequencies and powers) of cisoids in noise is described, which has several advantages including improved resolution over Pisarenko's technique for harmonic retrieval.
Abstract: The application of a subspace invariance approach (ESPRIT) to the estimation of parameters (frequencies and powers) of cisoids in noise is described. ESPRIT exploits an underlying rotational invariance of signal subspaces spanned by two temporally displaced data sets. The new approach has several advantages including improved resolution over Pisarenko's technique for harmonic retrieval.

1,040 citations


Journal ArticleDOI
TL;DR: This paper proposes a new isolated word recognition technique based on a combination of instantaneous and dynamic features of the speech spectrum that is shown to be highly effective in speaker-independent speech recognition.
Abstract: This paper proposes a new isolated word recognition technique based on a combination of instantaneous and dynamic features of the speech spectrum. This technique is shown to be highly effective in speaker-independent speech recognition. Spoken utterances are represented by time sequences of cepstrum coefficients and energy. Regression coefficients for these time functions are extracted for every frame over an approximately 50 ms period. Time functions of regression coefficients extracted for cepstrum and energy are combined with time functions of the original cepstrum coefficients, and used with a staggered array DP matching algorithm to compare multiple templates and input speech. Speaker-independent isolated word recognition experiments using a vocabulary of 100 Japanese city names indicate that a recognition error rate of 2.4 percent can be obtained with this method. Using only the original cepstrum coefficients the error rate is 6.2 percent.

812 citations


Journal ArticleDOI
TL;DR: A unified framework for the exact maximum likelihood estimation of the parameters of superimposed exponential signals in noise, encompassing both the time series and the array problems, is presented and the present formulation is used to interpret previous methods.
Abstract: A unified framework for the exact maximum likelihood estimation of the parameters of superimposed exponential signals in noise, encompassing both the time series and the array problems, is presented. An exact expression for the ML criterion is derived in terms of the linear prediction polynomial of the signal, and an iterative algorithm for the maximization of this criterion is presented. The algorithm is equally applicable in the case of signal coherence in the array problem. Simulation shows the estimator to be capable of providing more accurate frequency estimates than currently existing techniques. The algorithm is similar to those independently derived by Kumaresan et al. In addition to its practical value, the present formulation is used to interpret previous methods such as Prony's, Pisarenko's, and modifications thereof.

791 citations


Journal ArticleDOI
TL;DR: It is shown that it is possible to design tree-structured analysis/reconstruction systems which meet the sampling rate condition and which result in exact reconstruction of the input signal.
Abstract: In recent years, tree-structured analysis/reconstruction systems have been extensively studied for use in subband coders for speech. In such systems, it is imperative that the individual channel signals be decimated in such a way that the number of samples coded and transmitted do not exceed the number of samples in the original speech signal. Under this constraint, the systems presented in the past have sought to remove the aliasing distortion while minimizing the overall analysis/reconstruction distortion. In this paper, it is shown that it is possible to design tree-structured analysis/reconstruction systems which meet the sampling rate condition and which result in exact reconstruction of the input signal. The conditions for exact reconstruction are developed and presented. Furthermore, it is shown that these conditions are not overly restrictive and high-quality frequency division may be performed in the analysis section. A filter design procedure is presented which allows high-quality filters to be easily designed.

785 citations


Journal Article
TL;DR: This investigation of the properties of stack filters produces several new, useful, and easily implemented filters, including two which are named asymmetric median filters.
Abstract: The median and other rank-order operators possess two properties called the threshold decomposition and the stacking properties. The first is a limited superposition property which leads to a new architecture for these filters; the second is an ordering property which allows an efficient VLSI implementation of the threshold decomposition architecture. Motivated by the success of rank-order filters in a wide variety of applications and by the ease with which they can now be implemented, we consider in this paper a new class of filters called stack filters. They share the threshold decomposition and stacking properties of rank-order filters but are otherwise unconstrained. They are shown to form a very large class of easily implemented nonlinear filters which includes the rank-order operators as well as all compositions of morphological operators. The convergence properties of these filters are investigated using techniques similar to those used to determine root signal behavior of median filters. The results obtained include necessary conditions for a stack filter to preserve monotone regions or edges in signals. The output distribution for these filters is also found. All the stack filters of window width 3 are determined along with their convergence properties. Among these filters are found two which we have named asymmetric median filters. They share all the properties of median filters except that they remove impulses of one sign only; that is, one removes only positive going edges, the other removes only negative going edges, while the median filter removes impulses of both signs. This investigation of the properties of stack filters thus produces several new, useful, and easily implemented filters.

615 citations


Journal ArticleDOI
TL;DR: An asymptotic statistical analysis of the null-spectra of two eigen-assisted methods, MUSIC and Minimum-Norm, for resolving independent closely spaced plane waves in noise finds an approximate expression for the resolution threshold of two plane waves with equal power in noise.
Abstract: This paper presents an asymptotic statistical analysis of the null-spectra of two eigen-assisted methods, MUSIC [1] and Minimum-Norm [2], for resolving independent closely spaced plane waves in noise. Particular attention is paid to the average deviation of the null-spectra from zero at the true angles of arrival for the plane waves. These deviations are expressed as functions of signal-to-noise ratios, number of array elements, angular separation of emitters, and the number of snapshots. In the case of MUSIC. an approximate expression is derived for the resolution threshold of two plane waves with equal power in noise. This result is validated by Monte Carlo simulations.

588 citations


Journal ArticleDOI
TL;DR: A single-sideband analysis/synthesis system is proposed which provides perfect reconstruction of a signal from a set of critically sampled analysis signals and allows overlap between adjacent time windows, implying that time domain aliasing is introduced in the analysis; however, thisAliasing is cancelled in the synthesis process, and the system can provide perfect reconstruction.
Abstract: A single-sideband analysis/synthesis system is proposed which provides perfect reconstruction of a signal from a set of critically sampled analysis signals. The technique is developed in terms of a weighted overlap-add method of analysis/synthesis and allows overlap between adjacent time windows. This implies that time domain aliasing is introduced in the analysis; however, this aliasing is cancelled in the synthesis process, and the system can provide perfect reconstruction. Achieving perfect reconstruction places constraints on the time domain window shape which are equivalent to those placed on the frequency domain shape of analysis/synthesis channels used in recently proposed critically sampled systems based on frequency domain aliasing cancellation [7], [8]. In fact, a duality exists between the new technique and the frequency domain techniques of [7] and [8], The proposed technique is more efficient than frequency domain designs for a given number of analysis/synthesis channels, and can provide reasonably band-limited channel responses. The technique could be particularly useful in applications where critically sampled analysis/synthesis is desirable, e.g., coding.

566 citations


Journal ArticleDOI
TL;DR: This paper treats analytically and experimentally the steady-state operation of RLS (recursive least squares) adaptive filters with exponential windows for stationary and nonstationary inputs and presents new RLS restart procedures applied to transversal structures for mitigating the disastrous results of the third source of noise.
Abstract: Adaptive signal processing algorithms derived from LS (least squares) cost functions are known to converge extremely fast and have excellent capabilities to "track" an unknown parameter vector. This paper treats analytically and experimentally the steady-state operation of RLS (recursive least squares) adaptive filters with exponential windows for stationary and nonstationary inputs. A new formula for the "estimation-noise" has been derived involving second- and fourth-order statistics of the filter input as well as the exponential windowing factor and filter length. Furthermore, it is shown that the adaptation process associated with "lag effects" depends solely on the exponential weighting parameter λ. In addition, the calculation of the excess mean square error due to the lag for an assumed Markov channel provides the necessary information about tradeoffs between speed of adaptation and steady-state error. It is also the basis for comparison to the simple LMS algorithm, in a simple case of channel identification, it is shown that the LMS and RLS adaptive filters have the same tracking behavior. Finally, in the last part, we present new RLS restart procedures applied to transversal structures for mitigating the disastrous results of the third source of noise, namely, finite precision arithmetic.

412 citations


Journal ArticleDOI
TL;DR: It is shown that an upper bound for the convergence time is the classical mean-square-error time constant, and examples are given to demonstrate that for broad signal classes the convergenceTime is reduced by a factor of up to 50 in noise canceller applications for the proper selection of variable step parameters.
Abstract: In recent work, a new version of an LMS algorithm has been developed which implements a variable feedback constant μ for each weight of an adaptive transversal filter. This technique has been called the VS (variable step) algorithm and is an extension of earlier ideas in stochastic approximation for varying the step size in the method of steepest descents. The method may be implemented in hardware with only modest increases in complexity ( \approx 15 percent) over the LMS Widrow-Hoff algorithm. It is shown that an upper bound for the convergence time is the classical mean-square-error time constant, and examples are given to demonstrate that for broad signal classes (both narrow-band and broad-band) the convergence time is reduced by a factor of up to 50 in noise canceller applications for the proper selection of variable step parameters. Finally, the VS algorithm is applied to an IIR filter and simulations are presented for applications of the VS FIR and IIR adaptive filters.

Journal ArticleDOI
TL;DR: A new nonlinear, space-variant filtering algorithm is proposed which smooths jagged edges without blurring them, and smooths out abrupt intensity changes in monotone areas.
Abstract: An important application of spatial filtering techniques is in the postprocessing of images degraded by coding. Linear, space-invariant filters are inadequate to reduce the noise produced by block coders. The noise in block coded images is correlated with the local characteristics of the signal, and such filters are unable to exploit this correlation to reduce the noise. We propose a new nonlinear, space-variant filtering algorithm which smooths jagged edges without blurring them, and smooths out abrupt intensity changes in monotone areas. Edge sharpness is preserved because near edges the filtering of the signal is negligible. Consequently, in-band noise is not reduced, but the well-known masking effect reduces the visibility of this in-band noise. The algorithm is only slightly more complex to implement than simple linear filtering. We present examples of processed images and SNR figures to demonstrate that a significant improvement in subjective and objective quality is achieved.

Journal ArticleDOI
TL;DR: Various methods for measurement/computation of spectral correlation functions for time series that exhibit cyclostationarity are described in a unifying theoretical framework, and the interaction among reliability and temporal, spectral, and cycle resolutions is determined.
Abstract: Various methods for measurement/computation of spectral correlation functions for time series that exhibit cyclostationarity are described in a unifying theoretical framework. Some of these are amenable to digital hardware or software implementations, others are amenable to analog electrical or optical implementations, and other implementation types used for conventional spectral analysis are also possible. The interaction among reliability and temporal, spectral, and cycle resolutions is determined. Novel problems of computational complexity, cycle leakage and aliasing, cycle resolution, and cycle phasing are discussed. Sample spectral correlation functions are calculated with digital software for several simulated signals.

Journal ArticleDOI
TL;DR: Results of simulations indicate that the variances of the estimates are of the same order of magnitude as the CRB for sufficiently large data sets, and illustrate the performance in enhancing noisy artificial periodic signals.
Abstract: A new algorithm is presented for adaptive comb filtering and parametric spectral estimation of harmonic signals with additive white noise. The algorithm is composed of two cascaded parts. The first estimates the fundamental frequency and enhances the harmonic component in the input, and the second estimates the harmonic amplitudes and phases. Performance analysis provides new results for the asymptotic Cramer-Rao bound (CRB) on the parameters of harmonic signals with additive white noise. Results of simulations indicate that the variances of the estimates are of the same order of magnitude as the CRB for sufficiently large data sets, and illustrate the performance in enhancing noisy artificial periodic signals.

Journal ArticleDOI
TL;DR: The superiority of the AMNOR criterion over conventional LMS and constrained LMS criteria for reducing noise in speech signals was confirmed in subjective preference tests.
Abstract: This paper introduces a new adaptive microphone-array system for noise reduction (AMNOR system). It is first shown that there exists a tradeoff relationship between reducing the output noise power and reducing the frequency response degradation of a microphone-array to a desired signal. It is then shown that this tradeoff can be controlled by the introduction of a fictitious desired signal. A new optimization criterion is presented which minimizes the output noise power while maintaining the frequency response degradation below some pre-determined value (AMNOR criterion). AMNOR determines an optimal noise reduction filter based on this criterion by controlling the tradeoff utilizing the fictitious desired signal. Experiments on noise reduction processing were carried out in a room with a 0.4-s reverberation time. The superiority of the AMNOR criterion over conventional LMS and constrained LMS criteria for reducing noise in speech signals was confirmed in subjective preference tests. The AMNOR system improved the SNR by more than 15 dB in the 300-3200 Hz range.

Journal ArticleDOI
TL;DR: A signal synthesis algorithm that works directly with the real-valued high-resolution WD will be derived and examples of how this WD synthesis procedure can be used to perform time-varying filtering operations or signal separation will be given.
Abstract: The short-time Fourier transform (STFT), the ambiguity function (AF), and the Wigner distribution (WD) are mixed time-frequency signal representations that use Fourier transform techniques to map a one-dimensional function of time into a two-dimensional function of time and frequency. These mixed time-frequency mappings have been used to analyze the local frequency characteristics of a variety of signals and systems. Although much work has also been done to develop STFT and AF synthesis algorithms that can be used to implement a variety of time-varying signal processing operations, no such synthesis techniques have thus far been developed for the WD. In this paper, a signal synthesis algorithm that works directly with the real-valued high-resolution WD will be derived. Examples of how this WD synthesis procedure can be used to perform time-varying filtering operations or signal separation will be given.

Journal Article
TL;DR: It is found that nearly optimum performance can be obtained in a simple delay and sum beamformer by shading to reduce sidelobes and modest oversteering to reduce mainlohe width without too large a reduction in mainlobe sensitivity.
Abstract: The problem considered is that of designing endfire line array shadings which provide a useful amount of supergain without extreme sensitivity to random errors. Optimum shading weights are obtained subject to a constraint on the gain against uncorrelated white noise. The results of optimum array gain versus white noise gain constraint are presented parametrically for arrays of different interelement spacings, and different noise fields. Results are presented for spherically and cylindrically isotropic noise, and other wavenumber limited noise fields, used in modeling ocean ambient noise. It is found that nearly optimum performance can be obtained in a simple delay and sum beamformer by shading to reduce sidelobes and modest oversteering to reduce mainlohe width without too large a reduction in mainlobe sensitivity.

Journal ArticleDOI
Pierre Duhamel1
TL;DR: This algorithm belongs to that class of recently proposed 2n-FFT's which present the same arithmetic complexity (the lowest among any previously published one) and can easily be applied to real and real-symmetric data with reduced arithmetic complexity by removing all redundancy in the algorithm.
Abstract: A new algorithm is presented for the fast computation of the discrete Fourier transform. This algorithm belongs to that class of recently proposed 2n-FFT's which present the same arithmetic complexity (the lowest among any previously published one). Moreover, this algorithm has the advantage of being performed "in-place," by repetitive use of a "butterfly"-type structure, without any data reordering inside the algorithm. Furthermore, it can easily be applied to real and real-symmetric data with reduced arithmetic complexity by removing all redundancy in the algorithm.

Journal ArticleDOI
TL;DR: A new method of converting between the direct form predictor coefficients and line spectral frequencies is presented, which is highly accurate and can be used in a form that avoids the storage of trigonometric tables or the computation of trig onometric functions.
Abstract: Line spectral frequencies provide an alternate parameterization of the analysis and synthesis filters used in linear predictive coding (LPC) of speech. In this paper, a new method of converting between the direct form predictor coefficients and line spectral frequencies is presented. The system polynomial for the analysis filter is converted to two even-order symmetric polynomial with interlacing roots on the unit circle. The line spectral frequencies are given by the positions of the roots of these two auxiliary polynomials. The response of each of these polynomials on the unit circle is expressed as a series expansion in Chebyshev polynomials. The line spectral frequencies are found using an iterative root finding algorithm which searches for real roots of a real function. The algorithm developed is simple in structure and is designed to constrain the maximum number of evaluations of the series expansions. The method is highly accurate and can be used in a form that avoids the storage of trigonometric tables or the computation of trigonometric functions. The reconversion of line spectral frequencies to predictor coefficients uses an efficient algorithm derived by expressing the root factors as an expansion in Chebyshev polynomials.

Journal ArticleDOI
TL;DR: The transient mean and second-moment behavior of the modified LMS (NLMS) algorithm are evaluated, taking into account the explicit statistical dependence of μ upon the input data.
Abstract: The LMS adaptive filter algorithm requires a priori knowledge of the input power level to select the algorithm gain parameter μ for stability and convergence. Since the input power level is usually one of the statistical unknowns, it is normally estimated from the data prior to beginning the adaptation process. It is then assumed that the estimate is perfect in any subsequent analysis of the LMS algorithm behavior. In this paper, the effects of the power level estimate are incorporated in a data dependent μ that appears explicitly within the algorithm. The transient mean and second-moment behavior of the modified LMS (NLMS) algorithm are evaluated, taking into account the explicit statistical dependence of μ upon the input data. The mean behavior of the algorithm is shown to converge to the Wiener weight. A constant coefficient matrix difference equation is derived for the weight fluctuations about the Wiener weight. The equation is solved for a white data covariance matrix and for the adaptive line enhancer with a single-frequency input in steady state for small μ. Expressions for the misadjustment error are also presented. It is shown for the white data covariance matrix case that the averaging of about ten data samples causes negligible degradation as compared to the LMS algorithm. In the ALE application, the steady-state weight fluctuations are shown to be mode dependent, being largest at the frequency of the input.

Journal ArticleDOI
TL;DR: In this article, the authors explore techniques for replacing missing speech with wave-form segments from correctly received packets in order to increase the maximum tolerable missing packet rate in voice communications.
Abstract: Packet communication systems cannot, in general, guarantee accurate and prompt delivery of every packet. The effect of network congestion and transmission impairments on data packets is extended delay; in voice communications these problems lead to lost packets. When some speech packets are not available, the simplest response of a receiving terminal is to substitute silence for the missing speech. Here, we explore techniques for replacing missing speech with wave-form segments from correctly received packets in order to increase the maximum tolerable missing packet rate. After presenting a simple formula for predicting the probability of waveform substitution failure as a function of packet duration and packet loss rate, we introduce two techniques for selecting substitution waveforms. One method is based on pattern matching and the other technique explicitly estimates voicing and pitch. Both approaches achieve substantial improvements in speech quality relative to silence substitution. After waveform substitution, a significant component of the perceived distortion is due to discontinuities at packet boundaries. To reduce this distortion, we introduce a simple smoothing procedure.

Journal ArticleDOI
P. Delsarte1, Y. Genin1
TL;DR: The classical Levinson algorithm for computing the predictor polynomial relative to a real positive definite Toeplitz matrix is shown to be redundant in complexity and can be broken down into two simpler algorithms, either of which needs only to be processed.
Abstract: The classical Levinson algorithm for computing the predictor polynomial relative to a real positive definite Toeplitz matrix is shown to be redundant in complexity. It can be broken down into two simpler algorithms, either of which needs only to be processed. This result can be interpreted in the framework of the theory of orthogonal polynomials on the real line as follows: the symmetric and antisymmetric parts of the predictors relative to the sequence of Toeplitz matrices constitute two families of polynomials orthogonal on the interval [- 1,1] with respect to some even weight functions. It turns out that the recurrence relations for these orthogonal polynomials can be used efficiently to compute the desired predictor. The resulting "split Levinson algorithm" requires roughly one-half the number of multiplications and the same number of additions as the classical Levinson algorithm. A simple derivation of Cybenko's method for computing the Pisarenko frequencies is obtained from the recurrence relations underlying the split Levinson algorithm.

Journal ArticleDOI
TL;DR: This paper presents an efficient Fortran program that computes the Duhamel-Hollmann split-radix FFT, which seems to require the least total arithmetic of any power-of-two DFT algorithm.
Abstract: This paper presents an efficient Fortran program that computes the Duhamel-Hollmann split-radix FFT. An indexing scheme is used that gives a three-loop structure for the split-radix FFT that is very similar to the conventional Cooley-Tukey FFT. Both a decimation-in-frequency and a decimation-in-time program are presented. An arithmetic analysis is made to compare the operation count of the Cooley-Tukey FFT fo several different radixes to that of the split-radix FFT. The split-radix FFT seems to require the least total arithmetic of any power-of-two DFT algorithm.

Journal ArticleDOI
TL;DR: In this paper, a connection between fitting exponential models and pole-zero models to observed data is made, and the fitting problem is formulated as a constrained nonlinear minimization problem.
Abstract: An explicit connection between fitting exponential models and pole-zero models to observed data is made. The fitting problem is formulated as a constrained nonlinear minimization problem. This problem is then solved using a simplified iterative algorithm. The algorithm is applied to simulated data, and the performance of the algorithm is compared to previous results.

Journal ArticleDOI
TL;DR: An optimal statistical parameter estimation technique is presented for the identification of unknown image and blur model parameters and the proposed algorithms constitute a generalization of previous work on blur identification in that they are able to locate the zero loci of the blurred image spectrum on the entire z 1 - z 2 plane.
Abstract: An optimal statistical parameter estimation technique is presented for the identification of unknown image and blur model parameters. The development leads to an autoregressive moving average (ARMA) model identification problem, where the image model coefficients define the AR part, and the blur parameters define the MA part. Conditional maximum-likelihood estimates of the unknown parameters are derived both in the absence and in the presence of observation noise. The proposed algorithms constitute a generalization of previous work on blur identification in that they are able to locate the zero loci of the blurred image spectrum on the entire z 1 - z 2 plane. Simulation results, as well as photographically blurred images processed with the proposed algorithms, are shown as examples.

Journal ArticleDOI
TL;DR: Using the generalized baseband coder formulation, it is demonstrated that under reasonable assumptions concerning the weighting filter, an attractive low-complexity/high-quality coder can be obtained.
Abstract: This paper describes an effective and efficient time domain speech encoding technique that has an appealing low complexity, and produces toll quality speech at rates below 16 kbits/s. The proposed coder uses linear predictive techniques to remove the short-time correlation in the speech signal. The remaining (residual) information is then modeled by a low bit rate reduced excitation sequence that, when applied to the time-varying model filter, produces a signal that is "close" to the reference speech signal. The procedure for finding the optimal constrained excitation signal incorporates the solution of a few strongly coupled sets of linear equations and is of moderate complexity compared to competing coding systems such as adaptive transform coding and multipulse excitation coding. The paper describes the novel coding idea and the procedure for finding the excitation sequence. We then show that the coding procedure can be considered as an "optimized" baseband coder with spectral folding as high-frequency regeneration technique. The effect of various analysis parameters on the quality of the reconstructed speech is investigated using both objective and subjective tests. Further, modifications of the basic algorithm, and their impact on both the quality of the reconstructed speech signal and the complexity of the encoding algorithm, are discussed. Using the generalized baseband coder formulation, we demonstrate that under reasonable assumptions concerning the weighting filter, an attractive low-complexity/high-quality coder can be obtained.

Journal ArticleDOI
TL;DR: Based on the concept of a self-orthogonalizing algorithm in the transform domain, it is shown that the convergence speed of the TRLMS ADF can be improved significantly for the same excess MSE as that of the L MS ADF.
Abstract: In this paper we analyze the performance, particularly the convergence behavior, of the transform-domain least mean-square (LMS) adaptive digital filter (ADF) using the discrete Fourier transform and discrete orthogonal transforms such as discrete cosine and sine transforms. We first obtain the optimum Wiener solution and the minimum mean-squared error (MSE) in the transform domain. It is shown that the two minimum MSE's in the time and transform domains are identical independently of the transforms used. We then study the convergence conditions and the steady-state excess MSE's of the transform-domain LMS (TRLMS) algorithms both for the cases of having a constant and a time-varying convergence factors. When a constant convergence factor is used, the convergence behaviors of the LMS and TRLMS ADF's appear to be almost identical, provided that each has an appropriate value of the convergence factor depending on the transform used. Also, based on the concept of a self-orthogonalizing algorithm in the transform domain, it is shown that the convergence speed of the TRLMS ADF can be improved significantly for the same excess MSE as that of the LMS ADF. In addition, we compare the computational complexities of the LMS and TRLMS ADF'S. Finally, we investigate by computer simulation the effects of system parameter values and different transforms on the convergence behavior of the TRLMS ADF.

Journal ArticleDOI
TL;DR: In this paper, a speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformations including time-scale modification, frequency scaling, and pitch modification.
Abstract: In this paper a new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformations including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speaker's fundamental frequency and rate of articulation. The method is based on a sinusoidal representation of the speech production mechanism which has been shown to produce synthetic speech that preserves the wave-form shape and is perceptually indistinguishable from the original. Although the analysis/synthesis system was originally designed for single-speaker signals, it is also capable of recovering and modifying nonspeech signals such as music, multiple speakers, marine biologic sounds, and speakers in the presence of interferences such as noise and musical backgrounds.

Journal ArticleDOI
TL;DR: This paper proposes a solution to this unknown noise covariance problem for the case when the noise field is invariant under two measurements of the array covariance, and presents a new algorithm for this case.
Abstract: In eigenstructure methods for direction of arrival estimation of signal wavefronts, the additive noise is assumed to be spatially white, i.e., of equal power and uncorrelated from sensor to sensor. When the noise is nonwhite but has a known covariance, we can still handle the problem through prewhitening. However, there are no techniques presently available to deal with completely unknown noise fields. In this paper, we propose a solution to this unknown noise covariance problem for the case when the noise field is invariant under two measurements of the array covariance; situations where this assumption is valid are not uncommon in sonar applications. In fact, the idea has been used in certain so-called "despoking" algorithms for conventional beamformers. Results of computer simulations carried out to compare the performance of the new algorithm to earlier methods are also presented.

Journal ArticleDOI
TL;DR: It is shown that nonlinear filters based on these means behave well for both additive and impulse noise and they preserve the edges better than linear filters, and they reject the noise better than median filters.
Abstract: The use of nonlinear means in image processing is introduced. The properties of these means in the presence of different types of noise are investigated. It is shown that nonlinear filters based on these means behave well for both additive and impulse noise. Their performance in the presence of signal dependent noise is satisfactory. They preserve the edges better than linear filters, and they reject the noise better than median filters.