scispace - formally typeset
Search or ask a question
Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors examined the spectrum and cepstrum content of vibration signals taken from a helicopter gearbox with two different configurations (3 and 4 planets) and presented a signal processing algorithm to separate synchronous and nonsynchronous components for complete shafts extraction and removal.
Abstract: This paper examines the spectrum and cepstrum content of vibration signals taken from a helicopter gearbox with two different configurations (3 and 4 planets). It presents a signal processing algorithm to separate synchronous and nonsynchronous components for complete shafts’ harmonic extraction and removal. The spectrum and cepstrum of the vibration signal for two configurations are firstly analyzed and discussed. The effect of changing the number of planets on the fundamental gear mesh frequency (epicyclic mesh frequency) and its sidebands is discussed. The paper explains the differences between the two configurations and discusses, in particular, the asymmetry of the modulation sidebands about the epicyclic mesh frequency in the 4 planets arrangement. Finally a separation algorithm, which is based on resampling the order-tracked signal to have an integer number of samples per revolution for a specific shaft, is proposed for a complete removal of the shafts harmonics. The results obtained from the presented separation algorithms are compared to other separation schemes such as discrete random separation (DRS) and time synchronous averaging (TSA) with clear improvements and better results.

10 citations

Journal ArticleDOI
TL;DR: This study extracted a speech feature using an equivalent rectangular bandwidth (ERB) filter bank cepstrum and constructed a learning model using the acoustic model to improve the speech recognition rate.
Abstract: A range of speech extraction techniques have been applied to improve speech recognition when the signals are mixed with noise. Degradation of the speech recognition performance is caused by differences between the model training environment and the recognition environment due to inaccurate voice versus non-voice classification at low signal-to-noise ratios (SNRs). Problems also arise because voice activity detection is inaccurate when noise is caused by inconsistent changes in the recognition environment and the learning model. One technique is to extract a speech feature that is resistant to noise by removing that noise to improve the speech recognition performance. This study extracted such a feature using an equivalent rectangular bandwidth (ERB) filter bank cepstrum and constructed a learning model using the acoustic model to improve the speech recognition rate. The ERB filter bank cepstrum was examined in a computational auditory scene analysis system, which analyzes the properties of the speech signal. This paper improved the speech recognition rate by extracting such a feature with an ERB filter bank cepstrum. The proposed model used train and train station noises to evaluate the performance. The distortion was measured by performing noise reduction at SNRs of $$-10$$ - 10 and $$-5$$ - 5 dB in noisy environments, showing a respective 1.67 and 1.74 dB improvement in performance.

10 citations

Proceedings Article
01 Oct 1999
TL;DR: A more high level approach to spectral envelopes is taken, which can avoid the dilemma of how to control hundreds of partials, and the residual noise part can be treated by the same manipulations as the sinusoidal part by using the same representation.
Abstract: A spectral envelope is a curve in the frequency-magnitude plane which envelopes the short time spectrum of a signal, e.g. connecting the peaks which represent sinusoidal partials, or modeling the spectral density of a noise signal. It describes the perceptually pertinent distribution of energy over frequency, which determines a large part of timbre for instruments, and the type of vowel for speech. Because of the importance of using spectral envelopes for sound synthesis, a more high level approach to their handling is taken here. We present programs developed using spectral envelopes for analysis, representation, manipulation, and synthesis. Spectral envelopes can be estimated by linear prediction, cepstrum or discrete cepstrum. The strong and weak points of each are discussed relative to the requirements for estimation, such as robustness and regularity. Improvements of discrete cepstrum estimation (regularization, statistical smoothing, logarithmic frequency scale, adding control points) are presented. For speech signals, a composite envelope is shown to be advantageous. It is estimated from the sinusoidal partials and from the noise part above the maximum partial frequency. The representation of spectral envelopes is the central point for their handling. A good representation is crucial for the ease and flexibility with which they can be manipulated. Several requirements are laid out, such as stability, locality, and flexibility. The representations (filter coefficients, sampled, break-point-functions, splines, formants) are then discussed relative to these requirements. The notion of fuzzy formants based on formant regions is introduced. Some general forms of manipulations and morphing are presented. For morphing between two or more spectral envelopes over time, linear interpolation, and formant shifting which preserves valid vocal tract characteristics, are considered. For synthesis, spectral envelopes are applied to sinusoidal additive synthesis and are used for filtering the residual noise component. This is especially easy and efficient for both components in the FFT-1 technique. Finally, in additive analysis, spectral envelopes can be generalized not only to apply to magnitude, but also to frequency and phase, while keeping the same representation. The frequency envelope expresses harmonicity of partials over frequency, the phase envelope expresses phase relations between harmonic partials. With this high level approach to spectral envelopes, additive synthesis can avoid the dilemma of how to control hundreds of partials, and the residual noise part can be treated by the same manipulations as the sinusoidal part by using the same representation. Also, high quality singing voice synthesis can use morphing between sampled spectral envelopes and formants to combine natural sounding transitions with a precisely modeled sustained part. Abovementioned methods have been implemented in a C-library using the SDIF standard for sound description data as file format and are used in various real-time and non real-time programs on Unix and Macintosh.

10 citations

Proceedings ArticleDOI
02 Dec 1997
TL;DR: This paper presents an acoustic front-end which uses the properties of auditory masking for extracting acoustic features from the speech signal using a masking threshold as a function of frequency for a given speech frame from its power spectrum.
Abstract: This paper presents an acoustic front-end which uses the properties of auditory masking for extracting acoustic features from the speech signal. Using the properties of simultaneous masking found in the human auditory system, we compute a masking threshold as a function of frequency for a given speech frame from its power spectrum. All those portions of the power spectrum which are below the auditory threshold are not heard by the human auditory system due to masking effects and hence can be discarded. These portions are replaced by the corresponding portions in the masking threshold spectrum. This modified power spectrum is processed by the linear prediction analysis or homomorphic analysis procedure to derive cepstral features for each speech frame. We study the performance of this front-end for speech recognition under noisy environments. This front-end performs significantly better than the conventional linear prediction or homomorphic analysis based front-ends for noisy speech. In terms of signal-to-noise ratio, simultaneous masking offers an advantage of more than 5 dB over the LPCC front-end in isolated word recognition experiments and 3 dB in continuous speech recognition experiments.

10 citations

Proceedings ArticleDOI
12 May 1998
TL;DR: A novel kind of speech feature which is the modified Mellin transform of the log-spectrum of the speech signal (short for MMTLS) is presented, which is more appropriate for speaker-independent speech recognition than the popular used cepstrum.
Abstract: This paper presents a novel kind of speech feature which is the modified Mellin transform of the log-spectrum of the speech signal (short for MMTLS). Because of the scale invariance property of the modified Mellin transform, the new feature is insensitive to the variation of the vocal tract length among individual speakers, and thus it is more appropriate for speaker-independent speech recognition than the popular used cepstrum. The preliminary experiments show that the performance of the MMTLS-based method is much better in comparison with those of the LPC- and MFC-based methods. Moreover, the error rate of this method is very consistent for different outlier speakers.

10 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Robustness (computer science)
94.7K papers, 1.6M citations
80% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Deep learning
79.8K papers, 2.1M citations
79% related
Support vector machine
73.6K papers, 1.7M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202386
2022206
202160
202096
2019135
2018130