scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Acoustics, Speech, and Signal Processing in 1989"


Journal ArticleDOI
TL;DR: Although discussed in the context of direction-of-arrival estimation, ESPRIT can be applied to a wide variety of problems including accurate detection and estimation of sinusoids in noise.
Abstract: An approach to the general problem of signal parameter estimation is described. The algorithm differs from its predecessor in that a total least-squares rather than a standard least-squares criterion is used. Although discussed in the context of direction-of-arrival estimation, ESPRIT can be applied to a wide variety of problems including accurate detection and estimation of sinusoids in noise. It exploits an underlying rotational invariance among signal subspaces induced by an array of sensors with a translational invariance structure. The technique, when applicable, manifests significant performance and computational advantages over previous algorithms such as MEM, Capon's MLM, and MUSIC. >

6,273 citations


Journal ArticleDOI
TL;DR: The Cramer-Rao bound (CRB) for the estimation problems is derived, and some useful properties of the CRB covariance matrix are established.
Abstract: The performance of the MUSIC and ML methods is studied, and their statistical efficiency is analyzed. The Cramer-Rao bound (CRB) for the estimation problems is derived, and some useful properties of the CRB covariance matrix are established. The relationship between the MUSIC and ML estimators is investigated as well. A numerical study is reported of the statistical efficiency of the MUSIC estimator for the problem of finding the directions of two plane waves using a uniform linear array. An exact description of the results is included. >

2,552 citations


Journal ArticleDOI
TL;DR: In this article, the authors presented a time-delay neural network (TDNN) approach to phoneme recognition, which is characterized by two important properties: (1) using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation; and (2) the time delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independently of position in time and therefore not blurred by temporal shifts in the input
Abstract: The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: (1) using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation; and (2) the time-delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independently of position in time and therefore not blurred by temporal shifts in the input. As a recognition task, the speaker-dependent recognition of the phonemes B, D, and G in varying phonetic contexts was chosen. For comparison, several discrete hidden Markov models (HMM) were trained to perform the same task. Performance evaluation over 1946 testing tokens from three speakers showed that the TDNN achieves a recognition rate of 98.5% correct while the rate obtained by the best of the HMMs was only 93.7%. >

2,319 citations


Journal ArticleDOI
TL;DR: The author describes the mathematical properties of such decompositions and introduces the wavelet transform, which relates to the decomposition of an image into a wavelet orthonormal basis.
Abstract: The author reviews recent multichannel models developed in psychophysiology, computer vision, and image processing. In psychophysiology, multichannel models have been particularly successful in explaining some low-level processing in the visual cortex. The expansion of a function into several frequency channels provides a representation which is intermediate between a spatial and a Fourier representation. The author describes the mathematical properties of such decompositions and introduces the wavelet transform. He reviews the classical multiresolution pyramidal transforms developed in computer vision and shows how they relate to the decomposition of an image into a wavelet orthonormal basis. He discusses the properties of the zero crossings of multifrequency channels. Zero-crossing representations are particularly well adapted for pattern recognition in computer vision. >

2,109 citations


Journal ArticleDOI
TL;DR: In this article, a time-frequency distribution of L. Cohen's (1966) class is introduced, which is called exponential distribution (ED) after its exponential kernel function, and the authors interpret the ED from the spectral density-estimation point of view.
Abstract: The authors introduce a time-frequency distribution of L. Cohen's (1966) class and examines its properties. This distribution is called exponential distribution (ED) after its exponential kernel function. First, the authors interpret the ED from the spectral-density-estimation point of view. They then show how the exponential kernel controls the cross terms as represented in the generalized ambiguity function domain, and they analyze the ED for two specific types of multicomponent signals: sinusoidal signals and chirp signals. Next, they define the ED for discrete-time signals and the running windowed exponential distribution (RWED), which is computationally efficient. Finally, the authors present numerical examples of the RWED using the synthetically generated signals. It is found that the ED is very effective in diminishing the effects of cross terms while retaining most of the properties which are useful for a time-frequency distribution. >

1,306 citations


Journal ArticleDOI
TL;DR: A spatial smoothing scheme is further investigated and it is shown that by making use of a set of forward and complex conjugated backward subarrays simultaneously, it is always possible to estimate any K directions of arrival using at most 3K/2 sensor elements.
Abstract: A spatial smoothing scheme is further investigated in the context of coherent signal classification. It is shown that by making use of a set of forward and complex conjugated backward subarrays simultaneously, it is always possible to estimate any K directions of arrival using at most 3K/2 sensor elements. This is achieved by creating a smoothed array output covariance matrix that is structurally identical to a covariance matrix in some noncoherent situation. By incorporating eigenstructure-based techniques on this smoothed covariance matrix, it then becomes possible to correctly identify all directions of arrival irrespective of their correlation. >

1,105 citations


Journal ArticleDOI
TL;DR: The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data, and can be used as benchmarks to evaluate future systems.
Abstract: Hidden Markov modeling is extended to speaker-independent phone recognition. Using multiple codebooks of various linear-predictive-coding (LPC) parameters and discrete hidden Markov models (HMMs) the authors obtain a speaker-independent phone recognition accuracy of 58.8-73.8% on the TIMIT database, depending on the type of acoustic and language models used. In comparison, the performance of expert spectrogram readers is only 69% without use of higher level knowledge. The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data. Since the results were evaluated on a standard database, they can be used as benchmarks to evaluate future systems. >

895 citations


Journal ArticleDOI
TL;DR: In this article, the authors analyzed the performance of Root-Music, a variation of the MUSIC algorithm, for estimating the direction of arrival (DOA) of plane waves in white noise in the case of a linear equispaced sensor array.
Abstract: The authors analyze the performance of Root-Music, a variation of the MUSIC algorithm, for estimating the direction of arrival (DOA) of plane waves in white noise in the case of a linear equispaced sensor array The performance of the method is analyzed by examining the perturbation in the roots of the polynomial formed in the intermediate step of Root-Music In particular, asymptotic results for the mean squared error in the estimates of the direction of arrival are derived Simplified expressions are presented for the one- and two-source case and compared to those obtained for least-squares ESPRIT Computer simulations are also presented, and they are in close agreement with the theory An important outcome of this analysis is the fact that the error in the signal zeros has a largely radial component This provides an explanation as to why the Root-Music is superior to the spectral MUSIC algorithm >

854 citations


Journal ArticleDOI
TL;DR: A novel frequency estimator for a single complex sinusoid in complex white Gaussian noise is proposed that is more computationally efficient that the optimal maximum-likelihood estimator yet attains equally good performance at moderately high signal-to-noise ratios.
Abstract: A novel frequency estimator for a single complex sinusoid in complex white Gaussian noise is proposed. The estimator is more computationally efficient that the optimal maximum-likelihood estimator yet attains equally good performance at moderately high signal-to-noise ratios. The estimator is shown to be related to the linear prediction estimator. This relationship is used to reveal why the linear prediction estimator does not attain the Cramer-Rao bound even at high signal-to-noise ratios. >

732 citations


Journal ArticleDOI
TL;DR: The problem of image reconstruction and restoration is first formulated, and some of the current regularization approaches used to solve the problem are described, and a Bayesian interpretation of the regularization techniques is given.
Abstract: Developments in the theory of image reconstruction and restoration over the past 20 or 30 years are outlined. Particular attention is paid to common estimation structures and to practical problems not properly solved yet. The problem of image reconstruction and restoration is first formulated. Some of the current regularization approaches used to solve the problem are then described. The concepts of a priori information and compound criterion are introduced. A Bayesian interpretation of the regularization techniques is given which clarifies the role of the tuning parameters and indicates how they could be estimated. The practical aspects of computing the solution, first when the hyperparameters are known and second when they must be estimated, are then considered. Conclusions are drawn, and points that still need to be investigated are outlined. >

716 citations


Journal ArticleDOI
TL;DR: An iterative descent algorithm based on a Lagrangian formulation for designing vector quantizers having minimum distortion subject to an entropy constraint is discussed and it is shown that for clustering problems involving classes with widely different priors, the ECVQ outperforms the k-means algorithm in both likelihood and probability of error.
Abstract: An iterative descent algorithm based on a Lagrangian formulation for designing vector quantizers having minimum distortion subject to an entropy constraint is discussed. These entropy-constrained vector quantizers (ECVQs) can be used in tandem with variable-rate noiseless coding systems to provide locally optimal variable-rate block source coding with respect to a fidelity criterion. Experiments on sampled speech and on synthetic sources with memory indicate that for waveform coding at low rates (about 1 bit/sample) under the squared error distortion measure, about 1.6 dB improvement in the signal-to-noise ratio can be expected over the best scalar and lattice quantizers when block entropy-coded with block length 4. Even greater gains are made over other forms of entropy-coded vector quantizers. For pattern recognition, it is shown that the ECVQ algorithm is a generalization of the k-means and related algorithms for estimating cluster means, in that the ECVQ algorithm estimates the prior cluster probabilities as well. Experiments on multivariate Gaussian distributions show that for clustering problems involving classes with widely different priors, the ECVQ outperforms the k-means algorithm in both likelihood and probability of error. >

Journal ArticleDOI
TL;DR: An exact derivation of an optimal lapped orthogonal transform (LOT) is presented, related to the discrete cosine transform (DCT) in such a way that a fast algorithm for a nearly optimal LOT is derived.
Abstract: An exact derivation of an optimal lapped orthogonal transform (LOT) is presented. The optimal LOT is related to the discrete cosine transform (DCT) in such a way that a fast algorithm for a nearly optimal LOT is derived. Compared to the DCT, the fast LOT requires about 20-30% more computations, mostly additions. An image coding example demonstrates the effectiveness of the LOT in reducing blocking effects; the LOT actually leads to slightly smaller signal reconstruction errors than does the DCT. >

Journal ArticleDOI
TL;DR: The authors model a finite-dimensional system as an ARMA (autoregressive moving-average) rational function of known orders, but the special cases of AR, MA, and all-pass models are also considered.
Abstract: A method is presented for identification of linear, time-variant, nonminimum phase systems when only output data are available. The input sequence need not be independent, but it must be non-Gaussian, with some special properties described in the test. The authors model a finite-dimensional system as an ARMA (autoregressive moving-average) rational function of known orders, but the special cases of AR, MA, and all-pass models are also considered. To estimate the parameters of their model, the authors utilize both second- and higher-order statistics of the output, which may be contaminated by additive, zero-mean, Gaussian white noise of unknown variance. The parameter estimators obtained are proved, under mild conditions, to be consistent. Simulations verify the performance of the proposed method in the case of relatively low signal-to-noise ratios, and when there is a model-order mismatch. >

Journal ArticleDOI
TL;DR: An approach is presented to the problem of detecting the number of sources impinging on a passive sensor array that is based on J. Rissanen's (1983) minimum description length (MDL) principle, and two slightly different detection criteria are derived, both requiring the estimation of the locations of the sources.
Abstract: An approach is presented to the problem of detecting the number of sources impinging on a passive sensor array that is based on J. Rissanen's (1983) minimum description length (MDL) principle. The approach is applicable to any type of sources, including the case of sources which are fully correlated, referred to as the coherent signals case. Two slightly different detection criteria are derived, both requiring the estimation of the locations of the sources. The first is tailored to the detection problem per se, whereas the second is tailored to the combined detection/estimation problem. Consistency of the two criteria is proved and their performance is demonstrated by computer simulations. >

Journal ArticleDOI
W.H. Equitz1
TL;DR: The pairwise nearest neighbor (PNN) algorithm is presented as an alternative to the Linde-Buzo-Gray (1980, LBG) (generalized Lloyd, 1982) algorithm for vector quantization clustering.
Abstract: The pairwise nearest neighbor (PNN) algorithm is presented as an alternative to the Linde-Buzo-Gray (1980, LBG) (generalized Lloyd, 1982) algorithm for vector quantization clustering. The PNN algorithm derives a vector quantization codebook in a diminishingly small fraction of the time previously required, without sacrificing performance. In addition, the time needed to generate a codebook grows only O(N log N) in training set size and is independent of the number of code words desired. Using this method, one can either minimize the number of code words needed subject to a maximum rate. The PNN algorithm can be used with squared error and weighted squared error distortion measure. Simulations on a variety of images encoded at 1/2 b/pixel indicate that PNN codebooks can be developed in roughly 5% of the time required by the LBG algorithm. >

Journal ArticleDOI
TL;DR: Algorithms are presented for automatically constructing a binary decision tree designed to estimate the probability that a given word will be the next word uttered, which is compared to an equivalent trigram model and shown to be superior.
Abstract: The problem of predicting the next word a speaker will say, given the words already spoken; is discussed. Specifically, the problem is to estimate the probability that a given word will be the next word uttered. Algorithms are presented for automatically constructing a binary decision tree designed to estimate these probabilities. At each node of the tree there is a yes/no question relating to the words already spoken, and at each leaf there is a probability distribution over the allowable vocabulary. Ideally, these nodal questions can take the form of arbitrarily complex Boolean expressions, but computationally cheaper alternatives are also discussed. Some results obtained on a 5000-word vocabulary with a tree designed to predict the next word spoken from the preceding 20 words are included. The tree is compared to an equivalent trigram model and shown to be superior. >

Journal ArticleDOI
TL;DR: In this article, the authors analyzed finite impulse response (FIR) filter banks in both the z-transform and time domains, showing the alternatives between designs in the two domains, and relations between previously known systems are given.
Abstract: Perfect reconstruction finite impulse-response (FIR) filter banks are analyzed both in the z-transform and time domains, showing the alternatives between designs in the two domains. Various classes of perfect reconstruction schemes are indicated, and relations between previously known systems are given. Windowed modulated filter banks with low computational complexity and perfect reconstruction are shown. New factorizations of polyphase filter matrices leading in particular to linear-phase filters, are given. The computational complexity and the architecture of the new structures are indicated. >

Journal ArticleDOI
TL;DR: A simplified maximum-likelihood Gauss-Newton algorithm which provides asymptotically efficient estimates of these parameters is proposed and initial estimates for this algorithm are obtained by a variation of the overdetermined Yule-Walker method and periodogram-based procedure.
Abstract: The problem of estimating the frequencies, phases, and amplitudes of sinusoidal signals is considered. A simplified maximum-likelihood Gauss-Newton algorithm which provides asymptotically efficient estimates of these parameters is proposed. Initial estimates for this algorithm are obtained by a variation of the overdetermined Yule-Walker method and periodogram-based procedure. Use of the maximum-likelihood Gauss-Newton algorithm is not, however, limited to this particular initialization method. Some other possibilities to get suitable initial estimates are briefly discussed. An analytical and numerical study of the shape of the likelihood function associated with the sinusoids-in-noise process reveals its multimodal structure and clearly sets the importance of the initialization procedure. Some numerical examples are presented to illustrate the performance of the proposed estimation procedure. Comparison to the performance corresponding to the Cramer-Rao lower bound is also presented, using a simple expression for the asymptotic Cramer-Rao bound covariance matrix derived in the paper. >

Journal ArticleDOI
TL;DR: Based on the scattered look-ahead technique, fully pipelined and fully hardware efficient linear bidirectional systolic arrays for recursive digital filters are presented and the decomposition technique is extended to time-varying recursive systems.
Abstract: A look-ahead approach (referred to as scattered look-ahead) to pipeline recursive loops is introduced in a way that guarantees stability. A decomposition technique is proposed to implement the nonrecursive portion (generated due to the scattered look-ahead process) in a decomposed manner to obtain concurrent stable pipelined realizations of logarithmic implementation complexity with respect to the number of loop pipeline stages (as opposed to linear). The upper bound on the roundoff error in these pipelined filters is shown to improve with an increase in the number of loop pipeline stages. Efficient pipelined realizations are studied of both direct-form and state-space-form recursive digital filters. Based on the scattered look-ahead technique, fully pipelined and fully hardware efficient linear bidirectional systolic arrays for recursive digital filters are presented. The decomposition technique is extended to time-varying recursive systems. >

Journal ArticleDOI
TL;DR: The behavior of the delayed least-mean-square (DLMS) algorithm is studied and it is found that the step size in the coefficient update plays a key role in the convergence and stability of the algorithm.
Abstract: The behavior of the delayed least-mean-square (DLMS) algorithm is studied. It is found that the step size in the coefficient update plays a key role in the convergence and stability of the algorithm. An upper bound for the step size is derived that ensures the stability of the DLMS. The relationship between the step size and the convergence speed, and the effect of the delay on the convergence speed, are also studied. The analytical results are supported by computer simulations. >

Journal ArticleDOI
TL;DR: Necessary and sufficient conditions are derived for the unique localization of narrowband sources having the same known center frequency by passive sensor arrays that guarantees uniqueness for almost every batch of sampled data.
Abstract: Necessary and sufficient conditions are derived for the unique localization of narrowband sources having the same known center frequency by passive sensor arrays. The conditions specify the maximum number of sources that can be uniquely localized by a general array that satisfies some mild geometric constraints. The conditions are expressed in terms of the number of sensors and the rank of the correlation matrix of the sources. Two different conditions are presented. The first guarantees uniqueness for every batch of sampled data. The second, which is weaker, guarantees uniqueness for almost every batch of sampled data, with the exception of a set of batches of measure zero. It is shown that a condition that is slightly weaker than the second one is also necessary. >

Journal ArticleDOI
TL;DR: Two perfect-reconstruction structures for the two-channel quadrature mirror filter bank, free of aliasing and distortions of any kind, in which the analysis filters have linear phase, are described in this article.
Abstract: Two perfect-reconstruction structures for the two-channel quadrature mirror filter (QMF) bank, free of aliasing and distortions of any kind, in which the analysis filters have linear phase, are described. The structure in the first case is related to the linear prediction lattice structure. For the second case, new structures are developed by propagating the perfect-reconstruction and linear-phase properties. Design examples, based on optimization of the parameters in the lattice structures, are presented for both cases. >

Journal ArticleDOI
TL;DR: A quantitative and qualitative comparison of the multistage filters and of other efficient detail-preserving filters is presented and the comparisons are made using the mean-squared-error and themean-absolute-error criteria.
Abstract: The theoretical analysis of multistage median filters is developed. It is shown that multistage median filters are a combination of max/median and min/median filters. Since multistage median filters belong to the class of two-dimensional stack filters, they have threshold decomposition attributes making their theoretical analysis simple. Statistical threshold decomposition is applied to derive the statistical characteristics of these filters, and the results are used to evaluate the performance of these two types of multistage filters. Finally, a quantitative and qualitative comparison of the multistage filters and of other efficient detail-preserving filters is presented. The comparisons are made using the mean-squared-error and the mean-absolute-error criteria. >

Journal ArticleDOI
TL;DR: An iterative maximum-likelihood method for simultaneously estimating directions of arrival (DOA) and sensor locations is developed and a distinctive feature of the algorithm is its ability to locate the sensors accurately without deploying calibration sources at known locations.
Abstract: Sensor location uncertainty can severely degrade the performance of direction-finding systems. An iterative maximum-likelihood method for simultaneously estimating directions of arrival (DOA) and sensor locations is developed to alleviate this problem. The case of nondisjoint sources, i.e., sources observed in the same frequency cell and at the same time, is emphasized. The algorithm converges to the global maximum of the likelihood function if the initial conditions are sufficiently good. Numerical examples are presented, illustrating the performance of the proposed technique. A distinctive feature of the algorithm is its ability to locate the sensors accurately without deploying calibration sources at known locations. >

Journal ArticleDOI
TL;DR: The Wiener solution of a multichannel restoration scheme uses both the within-channel and between-channel correlation; hence, the restored result is a better estimate than that produced by independent channel restoration.
Abstract: The Wiener solution of a multichannel restoration scheme is presented. Using matrix diagonalization and block-Toeplitz to block-circulant approximation, the inversion of the multichannel, linear space-invariant imaging system becomes feasible by utilizing a fast iterative matrix inversion procedure. The restoration uses both the within-channel (spatial) and between-channel (spectral) correlation; hence, the restored result is a better estimate than that produced by independent channel restoration. Simulations are also presented. >

Journal ArticleDOI
TL;DR: The authors introduce a novel approach to modeling variable-duration phonemes, called the stochastic segment model, which allows the incorporation in Y of acoustic-phonetic features derived from X, in addition to the usual spectral features used in hidden Markov modeling and dynamic time warping approaches to speech recognition.
Abstract: The authors introduce a novel approach to modeling variable-duration phonemes, called the stochastic segment model. A phoneme X is observed as a variable-length sequence of frames, where each frame is represented by a parameter vector and the length of the sequence is random. The stochastic segment model consists of (1) a time warping of the variable-length segment X into a fixed-length segment Y called a resampled segment and (2) a joint density function of the parameters of X which in this study is a Gaussian density. The segment model represents spectra/temporal structure over the entire phoneme. The model also allows the incorporation in Y of acoustic-phonetic features derived from X, in addition to the usual spectral features that have been used in hidden Markov modeling and dynamic time warping approaches to speech recognition. The authors describe the stochastic segment model, the recognition algorithm, and an iterative training algorithm for estimating segment models from continuous speech. They present several results using segment models in two speaker-dependent recognition tasks and compare the performance of the stochastic segment model to the performance of the hidden Markov models. >

Journal ArticleDOI
TL;DR: An enhanced analysis feature set consisting of both instantaneous and transitional spectral information is used and the hidden-Markov-model (HMM)-based connected-digit recognizer in speaker-trained, multispeaker, and speaker-independent modes is tested.
Abstract: The authors use an enhanced analysis feature set consisting of both instantaneous and transitional spectral information and test the hidden-Markov-model (HMM)-based connected-digit recognizer in speaker-trained, multispeaker, and speaker-independent modes. For the evaluation, both a 50-talker connected-digit database recorded over local, dialed-up telephone lines, and the Texas Instruments, 225-adult-talker, connected-digits database are used. Using these databases, the performance achieved was 0.35, 1.65, and 1.75% string error rates for known-length strings, for speaker-trained, multispeaker, and speaker-independent modes, respectively, and 0.78, 2.85, and 2.94% string error rates for unknown-length strings of up to seven digits in length for the three modes. Several experiments were carried out to determine the best set of conditions (e.g., training, recognition, parameters, etc.) for recognition of digits. The results and the interpretation of these experiments are described. >

Journal ArticleDOI
G. Bi1, E.V. Jones1
TL;DR: A modified fast Fourier transform algorithm is described together with a real-time pipelined implementation that requires less data memory and only 1/3 of the number of complex multipliers of a conventional design.
Abstract: A modified fast Fourier transform algorithm is described together with a real-time pipelined implementation. The approach is particularly suited to sequentially presented input data. The method can be used for both mixed and uniform radix number implementations. For example, for the radix-4 implementation, the method requires less data memory and only 1/3 of the number of complex multipliers of a conventional design. >

Journal ArticleDOI
A. Nadas1, David Nahamoo1, Michael Picheny1
TL;DR: A probabilistic mixture mode is described for a frame (the short term spectrum) of speech to be used in speech recognition and each component is regarded as a prototype for the labeling phase of a hidden Markov model based speech recognition system.
Abstract: A probabilistic mixture mode is described for a frame (the short term spectrum) of speech to be used in speech recognition. Each component of the mixture is regarded as a prototype for the labeling phase of a hidden Markov model based speech recognition system. Since the ambient noise during recognition can differ from that present in the training data, the model is designed for convenient updating in changing noise. Based on the observation that the energy in a frequency band is at any fixed time dominated either by signal energy or by noise energy, the energy is modeled as the larger of the separate energies of signal and noise in the band. Statistical algorithms are given for training this as a hidden variables model. The hidden variables are the prototype identities and the separate signal and noise components. Speech recognition experiments that successfully utilize this model are described. >

Journal ArticleDOI
TL;DR: A maximum-a-posteriori approach for enhancing speech signals which have been degraded by statistically independent additive noise is proposed, based on statistical modeling of the clean speech signal and the noise process using long training sequences from the two processes.
Abstract: A maximum-a-posteriori approach for enhancing speech signals which have been degraded by statistically independent additive noise is proposed. The approach is based on statistical modeling of the clean speech signal and the noise process using long training sequences from the two processes. Hidden Markov models (HMMs) with mixtures of Gaussian autoregressive (AR) output probability distributions (PDs) are used to model the clean speech signal. The model for the noise process depends on its nature. The parameter set of the HMM model is estimated using the Baum or the EM (estimation-maximization) algorithm. The noisy speech is enhanced by reestimating the clean speech waveform using the EM algorithm. Efficient approximations of the training and enhancement procedures are examined. This results in the segmental k-means approach for hidden Markov modeling, in which the state sequence and the parameter set of the model are alternately estimated. Similarly, the enhancement is done by alternate estimation of the state and observation sequences. An approximate improvement of 4.0-6.0 dB in signal-to-noise ratio (SNR) is achieved at 10-dB input SNR. >