scispace - formally typeset
Search or ask a question

Showing papers on "Cepstrum published in 1991"


BookDOI
01 May 1991
TL;DR: This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment, including the SNR-Dependent Cepstral Normalization, (SDCN) and the Codeword-Dependent Cep stral normalization (CDCN).
Abstract: This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical environments, and when a desk-top microphone (rather than a close-talking microphone) is used for speech input. Without such processing, mismatches between training and testing conditions produce an unacceptable degradation in recognition accuracy. Two kinds of environmental variability are introduced by the use of desk-top microphones and different training and testing conditions: additive noise and spectral tilt introduced by linear filtering. An important attribute of the novel compensation algorithms described in this thesis is that they provide joint rather than independent compensation for these two types of degradation. Acoustical compensation is applied in our algorithms as an additive correction in the cepstral domain. This allows a higher degree of integration within SPHINX, the Carnegie Mellon speech recognition system, that uses the cepstrum as its feature vector. Therefore, these algorithms can be implemented very efficiently. Processing in many of these algorithms is based on instantaneous signal-to-noise ratio (SNR), as the appropriate compensation represents a form of noise suppression at low SNRs and spectral equalization at high SNRs. The compensation vectors for additive noise and spectral transformations are estimated by minimizing the differences between speech feature vectors obtained from a "standard" training corpus of speech and feature vectors that represent the current acoustical environment. In our work this is accomplished by minimizing the distortion of vector-quantized cepstra that are produced by the feature extraction module in SPHINX. In this dissertation we describe several algorithms including the SNR-Dependent Cepstral Normalization, (SDCN) and the Codeword-Dependent Cepstral Normalization (CDCN). With CDCN, the accuracy of SPHINX when trained on speech recorded with a close-talking microphone and tested on speech recorded with a desk-top microphone is essentially the same obtained when the system is trained and tested on speech from the desk-top microphone. An algorithm for frequency normalization has also been proposed in which the parameter of the bilinear transformation that is used by the signal-processing stage to produce frequency warping is adjusted for each new speaker and acoustical environment. The optimum value of this parameter is again chosen to minimize the vector-quantization distortion between the standard environment and the current one. In preliminary studies, use of this frequency normalization produced a moderate additional decrease in the observed error rate.

474 citations


Proceedings ArticleDOI
14 Apr 1991
TL;DR: Several algorithms are presented that increase the robustness of SPHINX, the CMU (Carnegie Mellon University) continuous-speech speaker-independent recognition systems, by normalizing the acoustic space via minimization of the overall VQ distortion.
Abstract: Several algorithms are presented that increase the robustness of SPHINX, the CMU (Carnegie Mellon University) continuous-speech speaker-independent recognition systems, by normalizing the acoustic space via minimization of the overall VQ distortion. The authors propose an affine transformation of the cepstrum in which a matrix multiplication perform frequency normalization and a vector addition attempts environment normalization. The algorithms for environment normalization are efficient and improve the recognition accuracy when the system is tested on a microphone other than the one on which it was trained. The frequency normalization algorithm applies a different warping on the frequency axis to different speakers and it achieves a 10% decrease in error rate. >

229 citations


Journal ArticleDOI
TL;DR: It is demonstrated, by means of extensive simulations, that the proposed tricepstrum-based equalization scheme performs well and outperforms other existing blind equalizers, at the expense of higher computational complexity.
Abstract: An adaptive blind equalization method is introduced for nonminimum phase communication channels. The method estimates the inverse channel impulse response, by using the complex cepstrum of the fourth-order cumulants (tricepstrum) of the synchronously sampled received signal. As such, the proposed adaptive method depends only on the statistics of the received sequence, and is capable of reconstructing separately both the minimum and maximum phase response of the channel. It is demonstrated, by means of extensive simulations, that the proposed tricepstrum-based equalization scheme performs well and outperforms other existing blind equalizers, at the expense of higher computational complexity. >

211 citations


Journal ArticleDOI
TL;DR: Simulation results indicate that the proposed bispectrum-based approach performs significantly better than the classical power spectrum based approach at low signal-to-noise ratios.
Abstract: The authors propose inspecting the zero crossings in the central slice of the bispectrum of the observed image for blur identification. This method is an extension of the classical methods for blur identification in which the power spectrum (or the power cepstrum) of the blurred image is applied to the bispectrum domain. The proposed bispectrum-based approach utilizes the ability of the bispectrum to suppress additive, signal-independent, Gaussian observation noise. Simulation results indicate that the method performs significantly better than the classical power spectrum based approach at low signal-to-noise ratios. >

177 citations


PatentDOI
TL;DR: A flexible vocabulary speech recognition system is provided for recognizing speech transmitted via the public switched telephone network and phoneme models are modelled as hidden Markov models.
Abstract: A flexible vocabulary speech recognition system is provided for recognizing speech transmitted via the public switched telephone network. The flexible vocabulary recognition (FVR) system is a phoneme based system. The phonemes are modelled as hidden Markov models. The vocabulary is represented as concatenated phoneme models. The phoneme models are trained using Viterbi training enhanced by: substituting the covariance matrix of given phonemes by others, applying energy level thresholds and voiced, unvoiced, silence labelling constraints during Viterbi training. Specific vocabulary members, such as digits, are represented by allophone models. A* searching of the lexical network is facilitated by providing a reduced network which provides estimate scores used to evaluate the recognition path through the lexical network. Joint recognition and rejection of out-of-vocabulary words are provided by using both cepstrum and LSP parameter vectors.

132 citations


Journal ArticleDOI
TL;DR: The algorithm is similar to the cepstral smoothing approach for formant extraction using homomorphic deconvolution but the logarithmic operation is replaced by ()' operation and the additive and high resolution properties of group delay functions are exploited to emphasize formant peaks.

99 citations


Proceedings ArticleDOI
14 Apr 1991
TL;DR: An algorithm for speech dereverberation which incorporates a novel approach to the segmentation and windowing procedure for speech is presented, andveraging in the cepstrum is exploited to increase the separation between the speech and impulse response.
Abstract: Complex cepstral deconvolution is applied to acoustic dereverberation. It is found that traditional cepstral techniques fail in acoustic dereverberation because segmentation errors in the time domain prevent accurate cepstral computation. An algorithm for speech dereverberation which incorporates a novel approach to the segmentation and windowing procedure for speech is presented. Averaging in the cepstrum is exploited to increase the separation between the speech and impulse response. An estimate of the room impulse response is built, and a least squared error inverse filter is used to remove the estimated impulse response from the reverberant speech. Reduction of reverberation with this technique is demonstrated. >

74 citations


Journal ArticleDOI
TL;DR: A technique is described for generating guaranteed stable control laws for uncertain, modally dense structures with collocated sensors and actuators by ignoring the reverberant response created by reflections from other parts of the structure, which guarantees that the controller is positive real and the system will remain stable for any uncertainty.
Abstract: A technique is described for generating guaranteed stable control laws for uncertain, modally dense structures with collocated sensors and actuators. By ignoring the reverberant response created by reflections from other parts of the structure, a dereverberated mobility model can be developed that accurately models the local dynamics of the structure. This is similar in many respects to a wave-based model, but can treat more general structures, not only those that can be represented as a collection of waveguides. This model can be determined directly from transfer function data using an analysis technique based on the complex cepstrum. In order to minimize the effect of disturbances propagating through the structure, the power dissipated by the controller is maximized in an //<» sense. This guarantees that the controller is positive real and, thus, that the system will remain stable for any uncertainty, provided that the power flow is correctly modeled. The approach is demonstrated for two examples. The resulting controllers are much more effective than simple collocated rate feedback.

71 citations


Proceedings ArticleDOI
14 Apr 1991
TL;DR: The problem of speech recognition in the presence of interfering nonstationary noise is addressed and a method for noise reduction in the cepstral domain based on a multilayer network is proposed and tested on a large database of isolated words contaminated with non stationary F-16 jet noise.
Abstract: The problem of speech recognition in the presence of interfering nonstationary noise is addressed. A method for noise reduction in the cepstral domain based on a multilayer network is proposed and tested on a large database of isolated words contaminated with nonstationary F-16 jet noise. The speech recognition system consists of an auditory preprocessing module, the cepstral noise reduction multilayer network, and a neural network classifier. The noise reduction network performs a nonlinear autoassociative mapping in the cepstral domain between a set of noisy cepstral coefficients and a set of noise-free cepstral coefficients. The average recognition rate on a test database was improved up to 65% when the noise reduction network was added to the speech recognition system. >

51 citations


PatentDOI
Joji Kane1, Akira Nohara1
TL;DR: In this article, a signal detection apparatus for detecting a noise-suppressed speech signal is described, in which a band division process including a Fourier transformation is performed for an inputted speech signal, thereby outputting spectrum signals of plural channels.
Abstract: There is disclosed a signal detection apparatus for detecting a noise-suppressed speech signal. In the signal detection apparatus, a band division process including a Fourier transformation is performed for an inputted speech signal, thereby outputting spectrum signals of plural channels. A cepstrum analysis process is performed for the spectrum signals, and a peak of the obtained cepstrum is detected in response to the cepstrum analysis result. Thereafter, a speech signal interval of the inputted noisy speech signal is detected in response to the detected peak, and a noise is predicted in the speech signal in response to the detected speech signal interval. Then, the predicted noise is canceled in the spectrum signals thereby outputting noise-suppressed spectrum signals. Finally, the noise-suppressed spectrum signals are combined and are inverse Fourier-transformed, thereby outputting a noise-suppressed speech signal.

46 citations


PatentDOI
Joji Kane1, Akira Nohara1
TL;DR: In this article, a signal detection device consisting of cepstrum calculating means (71, 75, 81), peak detection means (72, 76, 82), analysis interval setting means (73, 78, 84), enabling the setting of an optimum analysis interval voice detection means, to which the peak detected output is supplied, for detecting voice.
Abstract: A signal detection device comprises cepstrum calculating means (71, 75, 81), peak detection means (72, 76, 82) for detecting peak of the cepstrum; analysis interval setting means (73, 78, 84) enabling the setting of an optimum analysis interval voice detection means (74, 714, 83) to which the peak detected output is supplied, for detecting voice, wherein the peak detection interval of said peak detection means (72, 76, 82) is controlled by the set output from said analysis interval setting means (73, 78, 84).

Proceedings ArticleDOI
14 Apr 1991
TL;DR: It is shown that for recognition based on the combination of the first two regression features with the static cepstral coefficients, increasing the time length to more than 200 ms, using all of the frames in this time interval, resulted in the highest recognition rates for noisy-Lombard test speech.
Abstract: It is proposed that the number of speech analysis frames used in calculating regression features should be controlled separately from the time length over which the features are calculated. Regression features are used to represent the first two time derivatives of the speech cepstrum in a speaker-independent, isolated-word recognition task. The recognition system is trained on normal (noise-free, non-Lombard) speech, but tested on normal, noisy, Lombard, or noisy-Lombard speech. It is shown that for recognition based on the combination of the first two regression features with the static cepstral coefficients, increasing the time length to more than 200 ms, using all of the frames in this time interval, resulted in the highest recognition rates for noisy-Lombard test speech. >

Journal Article
TL;DR: In this article, the effect of cepstral analysis on the vibrations of a toothed gearing was analyzed using an amplitude modulated epilepticoscillation (AMS) signal.
Abstract: This paper presents the application of cepstral analysis to the vibrations of a toothed gearing . The signal is modeled as an amplitude modulated oscillation and the effect of cepstrum is detailed . Cepstrum and autocorrelation are compared and the resolution of cepstrum is discussed .

Proceedings ArticleDOI
14 Apr 1991
TL;DR: A novel cepstral function, the cepstrum (CEP) of the one-sided autocorrelation sequence (COSA), is presented and applied to pitch determination of speech signals and significantly reduces their pitch-period errors at transitional speech segments as well as in speech signals contaminated by noise.
Abstract: A novel cepstral function, the cepstrum (CEP) of the one-sided autocorrelation sequence (COSA), is presented and applied to pitch determination of speech signals. This pitch determination algorithm (PDA) starts from the autocorrelation sequence in lieu of the speech signal. Although the COSA pitch determination algorithm does not improve the performance of the autocorrelation-with-center-clipping and CEP algorithms in quasiperiodic speech frames, it significantly reduces their pitch-period errors at transitional speech segments as well as in speech signals contaminated by noise. The PDA's better performance is based on its accuracy at nonstationary segments of speech signals and its noise capability. >

Journal ArticleDOI
TL;DR: Both the real and the power pseudocepstra using discrete cosine transform and discrete sine transform are reported and they are applied to speech pitch period extraction.
Abstract: Some empirical results in use of discrete trigonometric transforms for cepstrum analysis are presented. Both the real and the power pseudocepstra using discrete cosine transform and discrete sine transform are reported. They are applied to speech pitch period extraction. Comparisons are made with standard Fourier transform cepstrum analysis. >

Proceedings ArticleDOI
14 Apr 1991
TL;DR: A fast self-adapting algorithm to remove broadband noise in the cepstral domain is presented and yields good speech quality with little spectral distortion in the fast-changing F-15/F-16 aircraft cockpit environment.
Abstract: A fast self-adapting algorithm to remove broadband noise in the cepstral domain is presented. Noise removal is accomplished by cepstral subtraction in a manner which allows subtraction factors to be derived adaptively, based on the local signal-to-noise ratio (SNR) of each signal frame. Finite averaging noise estimates are used to minimize the adaptation time. The algorithm yields good speech quality with little spectral distortion. The algorithm has proven to be a useful and efficient tool for speech processing in the fast-changing F-15/F-16 aircraft cockpit environment. >

Proceedings ArticleDOI
14 Apr 1991
TL;DR: A least-squares method is presented to estimate the cross-bicepstral parameters of the three signals simultaneously, assuming only that the signals are exponential, stable, and have no zeros on the unit circle.
Abstract: Complex cepstrum techniques are applied to cross-bispectra in order to simultaneously reconstruct three independent finite-time signals, given the cross-bispectrum of the three signals only. The method is applied to simultaneously identify three systems via the cross-bispectrum of their outputs, knowing only the statistics of the input. A least-squares method is presented to estimate the cross-bicepstral parameters of the three signals simultaneously. The method is nonparametric and noniterative, assuming only that the signals (or impulse responses) are exponential, stable, and have no zeros on the unit circle. It takes full advantage of the two-dimensional nature of the bispectrum to reconstruct the three signals (or identify the three systems) simultaneously. >

Book ChapterDOI
01 Jan 1991
TL;DR: This chapter reviews the properties and behavior of cochlear models and their importance to ASR, and emphasizes the benefits gained from better models of “early” signal processing in mammals.
Abstract: The initial stages in speech processing, discussed in Chapter 4, are commonly performed using a short-time Fourier transformation (STFT) of the digitally-sampled acoustic time series. Several representations of the STFT have been employed for automatic speech recognition, including linear, logarithmic scale, logarithmic mel-scale, cepstral and differenced-cepstral coefficients. However, recent investigations of mammalian auditory processing have determined that the cochlea is a time-domain analyzer, and that the STFT representation is not always the most appropriate method of signal analysis. Therefore, this chapter reviews the properties and behavior of cochlear models and their importance to ASR. It emphasizes the benefits gained from better models of “early” signal processing in mammals. A discussion of artificial neural network applications for conventional signal processing problems follows. The remainder of this chapter discusses how low-level “feature maps” may be created and used in ASR applications.


Proceedings ArticleDOI
30 Aug 1991
TL;DR: A method for warping the frequency axis of cepstrum coefficients in a way analogous to the preprocessing performed by the human ear, using the bilinear transform to represent the LPC coefficients on a warped frequency scale is described.
Abstract: This paper describes a method for warping the frequency axis of cepstrum coefficients in a way analogous to the preprocessing performed by the human ear. The equations are derived and historical background relating to different warping scales is discussed. The calculation is a two-step procedure in which the bilinear transform is used to represent the LPC coefficients on a warped frequency scale. A warping constant determines the degree of transformation. This results in an ARMA representation of the filter transfer function. The second step determines recursively the cepstrum coefficients corresponding to this ARMA transfer function. >

Dissertation
01 May 1991
TL;DR: The research work comprising this thesis presents a new motion stereo model that is computationally less demanding and yields more accurate depth information than the existing methods and provides a unique method of range data acquisition and visualization of 3-D data.
Abstract: Currently existing methods for three-dimensional (3-D) reconstruction of an object are computationally intensive and lacking in accuracy. The research work comprising this thesis presents a new motion stereo model that is computationally less demanding and yields more accurate depth information than the existing methods. One of the traditional techniques for extracting depth information is to find the disparities of corresponding points in stereo images following a biological model of 3-D vision. A normal binocular stereo system uses two images to determine which point in one image corresponds to a given point in the other, i.e., to find the correspondence between two images. The resolution of the disparity depends on the baseline used. High resolution in disparity is achieved by increasing the baseline and decreasing the window size. Based on this idea, a new motion stereo model using a sequence of a number of images has been developed that can provide accurate depth information that is not available from a stereo vision system. The disparity, i.e., the translational difference between an image pair, has been computed precisely using a recently developed power cepstrum technique that is more robust and noise tolerant than the usual phase-correlation technique. The computation time required by the power cepstrum has been further reduced by using a Hartley-like transform that maps a real-valued sequence to a real-valued spectrum while preserving the useful properties of the Fourier transform. This new motion stereo vision model matches the corresponding points in two images with several intermediate images to reduce the error in matching from widely different perspectives and uses a Hartley-like transform to compute the power cepstrum for finding the disparities. The depth information extracted from the disparities of a sequence of images by the cepstrum technique is less computationally intensive yet avoids the occlusion problem in a stereo vision model. This new motion stereo model provides a unique method of range data acquisition and visualization of 3-D data.

Proceedings ArticleDOI
08 Jul 1991
TL;DR: The problem of speech recognition in the presence of interfering nonstationary noise is addressed and a method for noise reduction in the cepstral domain based on a universal approximator is proposed and tested on a large database of isolated words contaminated with non stationary F-16 jet cockpit noise.
Abstract: The problem of speech recognition in the presence of interfering nonstationary noise is addressed. A method for noise reduction in the cepstral domain based on a universal approximator is proposed and tested on a large database of isolated words contaminated with nonstationary F-16 jet cockpit noise. The speech recognition system consists of a concatenation of an auditory preprocessing module, the cepstral noise reduction network (CNR network), and a neural network classifier. The proposed architecture performs a nonlinear autoassociative mapping in the cepstral domain between a set of noisy cepstral coefficients from the preprocessing module and a set of noise-free cepstral coefficients. The output from the CNR network is input to the neural network classifier, in which the output functions are approximations to the Bayes optimal discriminant functions. Noise reduction is possible in the preprocessing module and in the classifier, essentially making the system a three-stage noise reduction system. The average recognition rate on a test database was improved up to 65% when the CNR network was added to the speech recognition system. >

Proceedings ArticleDOI
TL;DR: A new adaptive blind equalization algorithm, the power cepstrum and tricoherenceequalization algorithm (POTEA), based on second- and fourth-order statistics of the received sequence, which performs simultaneous identification and equalization of a nonminimum phase channel from its output only, without using training sequences.
Abstract: This paper introduces a new adaptive blind equalization algorithm, the power cepstrum and tricoherence equalization algorithm (POTEA), based on second- and fourth-order statistics of the received sequence. The algorithm performs simultaneous identification and equalization of a nonminimum phase channel from its output only, without using training sequences. POTEA is based on adaptive computations of the channel's power cepstrum and cepstrum of tricoherence by employing second- and fourth-order statistics, respectively. Extensive simulation results, with QAM signals, are presented to demonstrate the effectiveness of POTEA.© (1991) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
28 Aug 1991
TL;DR: The entire system of speech recognition using a back-propagation neural network was successfully implemented on a personal computer for words "one through "five" with an overall accuracy rate of 89%.
Abstract: Back-propagation neural networks are used for speaker-dependent speech recognition of isolated words. The speech is digitized and its features are extracted by using Fast Fourier Transforms. Then Linear Frequency Cepstrum coefficients were calculated. The entire system of speech recognition using a back-propagation neural network was successfully implemented on a personal computer (80x86 microprocessor based) for words "one" through "five" with an overall accuracy rate of 89%.

Proceedings ArticleDOI
14 Apr 1991
TL;DR: The results obtained demonstrate that homomorphic deconvolution by the TDCT method is a promising general approach.
Abstract: A description is given of the time domain cepstral transformations (TDCT) method, which uses a time domain approach to calculate the cepstral signal representation. It offers explicit formulas to map a general mixed phase time sequence from original time into the cepstral domain and vice versa. The TDCT method avoids or minimizes the restrictions of the Fourier transform method. Explicit and unique time domain cepstral transformations are derived. No specific preconditioning window is required to obtain accurate complex cepstra. The results obtained demonstrate that homomorphic deconvolution by the TDCT method is a promising general approach. >

Proceedings ArticleDOI
16 Jun 1991
TL;DR: The authors show that cepstrum analysis provides a robust and fast method for sequential image matching, even if the input is raw images, and it is shown that the power cepStrum computation cost can be reduced by replacing the two-dimensional Fourier transform with two- dimensional Hartley transform.
Abstract: Sequential image matching is a basic problem in image analysis and computer vision. Among the existing methods, there are intensity-based approaches and feature-based approaches, however, there are still some problems with the existing methods. It is hoped that the cepstrum method provides a robust and accurate way to match sequential images instead of correlation algorithms. In the present paper, the authors report their work on the cepstrum approach to the matching of sequential images. They show that cepstrum analysis provides a robust and fast method for sequential image matching, even if the input is raw images. It is also shown that the power cepstrum computation cost can be reduced by replacing the two-dimensional Fourier transform with two-dimensional Hartley transform. Cepstrum can also be obtained directly from the Hartley transformation. Experimental results are given to support the analysis. >

Journal ArticleDOI
TL;DR: A few applications of a separable Hartley-like (CAS-CAS) transform in two-dimensional signal processing is presented and the computational advantage of the proposed methods over the algorithms using 2-D FFT is discussed.
Abstract: A few applications of a separable Hartley-like (CAS-CAS) transform in two-dimensional (2-D) signal processing is presented. The applications discussed include (i) the interpolation of signals, (ii) the computation of Hilbert transform, and (iii) the complex cepstrum computation. The computational advantage of the proposed methods over the algorithms using 2-D FFT are discussed.

Journal ArticleDOI
TL;DR: The smoothed group delay spectrum (SGDS) distance measure is evaluated through the isolated work speech recognition experiment by specified speakers and the vowel recognition rate is improved, which improved the recognition rates for the syllable and the word by 2 percent or more on a relative scale.
Abstract: This paper evaluates first the smoothed group delay spectrum (SGDS) distance measure through the isolated work speech recognition experiment by specified speakers. The experiment was performed for the following three cases, considering the speech recognition in the actual environment: 1) the case where the channels have difference characteristics; 2) the case where a white noise is added to the input speech; and 3) the case where the telephone speech is used as the input. In all three cases, the recognition rate is improved drastically compared to the traditional LPC cepstrum distance measure. An improvement of the recognition rate by 16 percent was realized under the noise of segmental SN ratio 20 dB. Then the distance measure is evaluated for the case where the FFT cepstrum is converted into the group delay spectrum. The proposed method gives a better recognition rate compared to the conventional FFT cepstrum distance measure, but the result is worse than the SGDA measure by approximately 3 percent since the higher-order FFT cepstrum coefficient has a larger variance on the time axis. Finally, the SGDS distance measure is evaluated by the isolated word speech recognition system with the monosyllable as the registered speech. The vowel recognition rate is improved, which improved the recognition rates for the syllable and the word by 2 percent or more on a relative scale.

Patent
31 Oct 1991
TL;DR: In this paper, the pitch period is corrected by interpolation of 3 to 5 points at the time of sampling the power spectrum by the pitch periods interval in order to obtain the PSE in a pitch extracting section 8.
Abstract: PURPOSE:To assure the high quality of voice information compression by correcting a pitch period by using a lograrithmic power spectrum, searching the nearly max. value on the logarithmic power spectrum in accordance with the corrected pitch period and obtaining a normal PSE by the interpolation thereof. CONSTITUTION:The pitch period obtd. from the point indicating the max. value of the ceptsrum obtd. in a cepstrum section 7 is corrected by the point indicating the max. value of the logarithmic power spectrum to obtain a exact pitch period and the nearly max. value is searched to obtain the exact PSE by the interpolation of 3 to 5 points at the time of sampling the power spectrum by the pitch period interval in order to obtain the power spectrum envelop (PSE) in a pitch extracting section 8. The high quality of the voice information compression is assured in this way.

Proceedings ArticleDOI
04 Nov 1991
TL;DR: Comparisons are made of the proposed processing methods using a standard broadband rectangular function with random noise added in addition to actual echoic data and the problems associated with complex cepstrum processing are discussed.
Abstract: Adaptive finite impulse response (FIR) and infinite impulse response (IIR) frequency estimation methods are utilized in a detection scheme to identify frequency in the log power spectrum. In addition, an adjustable window based on a Butterworth design is suggested for use in isolating the cosinusoidal frequency. The advantages and disadvantages of windowing are discussed. Also the problems associated with complex cepstrum processing are discussed as they relate to echo detection and delay estimation. Comparisons are made of the proposed processing methods using a standard broadband rectangular function with random noise added in addition to actual echoic data. >