scispace - formally typeset
Search or ask a question

Showing papers on "Cepstrum published in 2013"


Proceedings ArticleDOI
26 May 2013
TL;DR: The proposed feature enhancement algorithm estimates a smoothed ideal ratio mask (IRM) in the Mel frequency domain using deep neural networks and a set of time-frequency unit level features that has previously been used to estimate the ideal binary mask.
Abstract: We propose a feature enhancement algorithm to improve robust automatic speech recognition (ASR). The algorithm estimates a smoothed ideal ratio mask (IRM) in the Mel frequency domain using deep neural networks and a set of time-frequency unit level features that has previously been used to estimate the ideal binary mask. The estimated IRM is used to filter out noise from a noisy Mel spectrogram before performing cepstral feature extraction for ASR. On the noisy subset of the Aurora-4 robust ASR corpus, the proposed enhancement obtains a relative improvement of over 38% in terms of word error rates using ASR models trained in clean conditions, and an improvement of over 14% when the models are trained using the multi-condition training data. In terms of instantaneous SNR estimation performance, the proposed system obtains a mean absolute error of less than 4 dB in most frequency channels.

557 citations


Journal Article
TL;DR: In this paper, a new technique for pre-whitening has been proposed, based on cepstral analysis, which seems a good candidate to perform the intermediate pre-whiteening step in an automatic damage recognition algorithm.
Abstract: Diagnostics of rolling element bearings involves a combination of different techniques of signal enhancing and analysis. The most common procedure presents a first step of order tracking and synchronous averaging, able to remove the undesired components, synchronous with the shaft harmonics, from the signal, and a final step of envelope analysis to obtain the squared envelope spectrum. This indicator has been studied thoroughly, and statistically based criteria have been obtained, in order to identify damaged bearings. The statistical thresholds are valid only if all the deterministic components in the signal have been removed. Unfortunately, in various industrial applications, characterized by heterogeneous vibration sources, the first step of synchronous averaging is not sufficient to eliminate completely the deterministic components and an additional step of pre-whitening is needed before the envelope analysis. Different techniques have been proposed in the past with this aim: The most widely spread are linear prediction filters and spectral kurtosis. Recently, a new technique for pre-whitening has been proposed, based on cepstral analysis: the so-called cepstrum pre-whitening. Owing to its low computational requirements and its simplicity, it seems a good candidate to perform the intermediate pre-whitening step in an automatic damage recognition algorithm. In this paper, the effectiveness of the new technique will be tested on the data measured on a full-scale industrial bearing test-rig, able to reproduce the harsh conditions of operation. A benchmark comparison with the traditional pre-whitening techniques will be made, as a final step for the verification of the potentiality of the cepstrum pre-whitening.

159 citations


Journal ArticleDOI
TL;DR: In this article, a new technique for pre-whitening has been proposed, based on cepstral analysis, which seems a good candidate to perform the intermediate pre-whiteening step in an automatic damage recognition algorithm.

158 citations


Proceedings ArticleDOI
25 Aug 2013
TL;DR: This paper presents a voice conversion technique using Deep Belief Nets (DBNs) to build high-order eigen spaces of the source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space.
Abstract: This paper presents a voice conversion technique using Deep Belief Nets (DBNs) to build high-order eigen spaces of the source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. DBNs have a deep architecture that automatically discovers abstractions to maximally express the original input features. If we train the DBNs using only the speech of an individual speaker, it can be considered that there is less phonological information and relatively more speaker individuality in the output features at the highest layer. Training the DBNs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NNs). The converted abstraction of the source speaker is then brought back to the cepstrum space using an inverse process of the DBNs of the target speaker. We conducted speakervoice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method.

140 citations


Journal ArticleDOI
TL;DR: In this paper, the potential for using the power spectrum, cepstrum, bispectrum, and neural network as a means for differentiating between healthy and faulty induction motor operation is examined.

106 citations


Journal ArticleDOI
TL;DR: This article presents a fully parametric formulation of a frequency warping plus amplitude scaling method in which bilinear frequency Warping functions are used and achieves similar performance scores to state-of-the-art statistical methods involving dynamic features and global variance.
Abstract: Voice conversion methods based on frequency warping followed by amplitude scaling have been recently proposed. These methods modify the frequency axis of the source spectrum in such manner that some significant parts of it, usually the formants, are moved towards their image in the target speaker's spectrum. Amplitude scaling is then applied to compensate for the differences between warped source spectra and target spectra. This article presents a fully parametric formulation of a frequency warping plus amplitude scaling method in which bilinear frequency warping functions are used. Introducing this constraint allows for the conversion error to be described in the cepstral domain and to minimize it with respect to the parameters of the transformation through an iterative algorithm, even when multiple overlapping conversion classes are considered. The paper explores the advantages and limitations of this approach when applied to a cepstral representation of speech. We show that it achieves significant improvements in quality with respect to traditional methods based on Gaussian mixture models, with no loss in average conversion accuracy. Despite its relative simplicity, it achieves similar performance scores to state-of-the-art statistical methods involving dynamic features and global variance.

85 citations


Journal ArticleDOI
TL;DR: Speaker verification results on the telephone and microphone speech of the latest NIST 2010 SRE corpus indicate that the multi-taper methods outperform the conventional periodogram technique.

73 citations


Journal ArticleDOI
TL;DR: In this paper, a minimum variance cepstrum (MVC) was introduced for the observation of periodic impulse signal under noisy environments, which was obtained by liftering a logarithmic power spectrum, and the lifter bank was designed by the minimum variance algorithm.

68 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed Cepstral approaches for the classification of a ship's radiated signal in the shallow underwater channel, in particular, where these classification systems are more likely to operate because of severe time-varying multi-path.
Abstract: Marine vessel classification is complicated by the variability in the radiated signal of the marine vessel because of changing machinery configuration for the same class of vessels. Further, the radiated signal of the marine vessel propagating towards a distant receiver undergoes random fluctuations in phase, amplitude and frequency. The ambient noise at the receiver will further complicate the authors’ classification problem. The shallow underwater channel, in particular, where these classification systems are more likely to operate presents the most challenges because of severe time-varying multi-path. Cepstral approaches are proposed in this study, including cepstral features and average cepstral features to augment existing feature sets that are mostly based on spectral analysis. Analytical studies have been supported by simulation experiments and tests on real ship recorded data. The cepstral features with cepstral liftering process is able to significantly reduce the multipath distortion effects of shallow underwater channel whereas the average cepstral feature is able to notably reduce the time-varying channel effects.

57 citations


Journal ArticleDOI
TL;DR: SDF-based feature extraction is compared with that of two commonly used feature extractors, namely Cepstrum and principal component analysis (PCA), for target detection and classification and shows consistently superior performance in terms of successful detection, false alarm, and misclassification rates.

53 citations


Journal ArticleDOI
TL;DR: Speech emotion recognition based on the previous technologies which uses different methods of feature extraction and different classifiers for the emotion recognition are reviewed.
Abstract: Field of emotional content recognition of speech signals has been gaining increasing interest during recent years. Several emotion recognition systems have been constructed by different researchers for recognition of human emotions in spoken utterances. This paper describes speech emotion recognition based on the previous technologies which uses different methods of feature extraction and different classifiers for the emotion recognition are reviewed. The database for the speech emotion recognition system is the emotional speech samples and the features extracted from these speech samples are the energy, pitch, linear prediction cepstrum coefficient (LPCC), Mel frequency cepstrum coefficient (MFCC). Different wavelet decomposition structures can also used for feature vector extraction. The classifiers are used to differentiate emotions such as anger, happiness, sadness, surprise, fear, neutral state, etc. The classification performance is based on extracted features. Conclusions drawn from performance and limitations of speech emotion recognition system based on different methodologies are also discussed.

Journal ArticleDOI
TL;DR: The experimental results for both subjective and objective tests confirm the superiority of the proposed HMM-based speech enhancement methods in the Mel-frequency domain over the reference methods, particularly for non-stationary noises.

01 Jan 2013
TL;DR: The ability of HPS (Harmonic Product Spectrum) algorithm and MFCC for gender and speaker recognition is explored and the quality and testing of speaker recognition and gender recognition system is completed and analysed.
Abstract: Speaker Recognition software using MFCC (Mel Frequency Cepstral Co-efficient) and vector quantization has been designed, developed and tested satisfactorily for male and female voice. In this paper the ability of HPS (Harmonic Product Spectrum) algorithm and MFCC for gender and speaker recognition is explored. HPS algorithm can be used to find the pitch of the speaker which can be used to determine gender of the speaker. In this algorithm the speech signals for male and female ware recorded in .wav(dot wav) file at 8 KHz sampling rate and then modified. This modified wav file for speech signal was processed using MATLAB software for computing and plotting the autocorrelation of speech signal. The software reliably computes the pitch of male and female voice. The MFCC algorithm and vector quantization algorithm is used for speech recognition process. By using Autocorrelation technique and FFT pitch of the signal is calculated which is used to identify the true gender. In this paper the quality and testing of speaker recognition and gender recognition system is completed and analysed.

Journal ArticleDOI
TL;DR: The use of the complex cepstrum is proposed to model the mixed phase characteristics of speech through the incorporation of phase information in statistical parametric synthesis.


Journal ArticleDOI
TL;DR: It is demonstrated how uncertainty propagation allows the computation of minimum mean square error (MMSE) estimates in the feature domain for various feature extraction methods using short-time Fourier transform (STFT) domain distortion models.
Abstract: In this paper we demonstrate how uncertainty propagation allows the computation of minimum mean square error (MMSE) estimates in the feature domain for various feature extraction methods using short-time Fourier transform (STFT) domain distortion models. In addition to this, a measure of estimate reliability is also attained which allows either feature re-estimation or the dynamic compensation of automatic speech recognition (ASR) models. The proposed method transforms the posterior distribution associated to a Wiener filter through the feature extraction using the STFT Uncertainty Propagation formulas. It is also shown that non-linear estimators in the STFT domain like the Ephraim-Malah filters can be seen as special cases of a propagation of the Wiener posterior. The method is illustrated by developing two MMSE-Mel-frequency Cepstral Coefficient (MFCC) estimators and combining them with observation uncertainty techniques. We discuss similarities with other MMSE-MFCC estimators and show how the proposed approach outperforms conventional MMSE estimators in the STFT domain on the AURORA4 robust ASR task.

Journal ArticleDOI
TL;DR: Investigation of low-variance multitaper spectrum estimation methods to compute the mel-frequency cepstral coefficient (MFCC) features for robust speech and speaker recognition systems shows that the multitaper methods perform better compared with the Hamming-windowed spectrum estimation method.
Abstract: In this paper, we investigate low-variance multitaper spectrum estimation methods to compute the mel-frequency cepstral coefficient (MFCC) features for robust speech and speaker recognition systems. In speech and speaker recognition, MFCC features are usually computed from a single-tapered (e.g., Hamming window) direct spectrum estimate, that is, the squared magnitude of the Fourier transform of the observed signal. Compared with the periodogram, a power spectrum estimate that uses a smooth window function, such as Hamming window, can reduce spectral leakage. Windowing may help to reduce spectral bias, but variance often remains high. A multitaper spectrum estimation method that uses well-selected tapers can gain from the bias-variance trade-off, giving an estimate that has small bias compared with a single-taper spectrum estimate but substantially lower variance. Speech recognition and speaker verification experimental results on the AURORA-2 and AURORA-4 corpora and the NIST 2010 speaker recognition evaluation corpus (telephone as well as microphone speech), respectively, show that the multitaper methods perform better compared with the Hamming-windowed spectrum estimation method. In a speaker verification task, compared with the Hamming window technique, the sinusoidal weighted cepstrum estimator, multi-peak, and Thomson multitaper techniques provide a relative improvement of 20.25, 18.73, and 12.83 %, respectively, in equal error rate.

Patent
01 May 2013
TL;DR: In this paper, a rolling bearing data collected by an acceleration sensor was decomposed into three layers of wavelet packets, and the energy of the third layer was solved to reconstruct the original signal according to the variation of energy values of each frequency bands.
Abstract: The invention relates to a fault diagnosis method, in particular to a rolling bearing fault diagnosis method based on the vibration detection. The method comprises the following steps of firstly decomposing the rolling bearing data collected by an acceleration sensor into three layers of wavelet packets, solving the energy of a third layer of wavelet packet coefficient rebuilding signals, selecting a frequency band with centralized energy to rebuild approximate evaluation of an original signal according to the variation of energy values of each frequency bands of the third layer; and utilizing a cepstrum to further analyze the rebuilt signal, and comparing the rebuilt signal with a theoretically-computed fault characteristic frequency and an edge frequency characteristic. By combining multiple resolutions of the wavelet packet and the cepstrum, the periodic component on a power spectrum, a separated-side frequency-band signal and the characteristics which are slightly subjected to the transmission route can be well detected. Meanwhile, the method is strong in manipulability and practicability.

Journal ArticleDOI
TL;DR: In this letter, a novel algorithm based on homomorphic deconvolution is proposed to give a theoretically unbiased estimation of transmitted nonlinearity in the beat signal in frequency-modulated continuous-wave synthetic aperture radar.
Abstract: In this letter, a novel algorithm based on homomorphic deconvolution is proposed to give a theoretically unbiased estimation of transmitted nonlinearity in the beat signal in frequency-modulated continuous-wave synthetic aperture radar. Transmitted and received nonlinearities in the beat signal are separated in the Cepstrum domain by using a controllable delay line, and the transmitted nonlinearity can be extracted by comb notch filter. Moreover, an integrated nonlinearity correction strategy with residual-video-phase removal is presented by combining with the proposed estimation algorithm. Extensive simulations are performed to validate the generality and the robustness of the proposed algorithm.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed algorithm could track non-stationary noise effectively without overestimating the NPSD, and achieves better performance in terms of both the segmental signal-to-noise-ratio improvement and the PESQ improvement.

Journal ArticleDOI
TL;DR: A feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems and gains 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel‐frequency cepstral coefficient FE method.
Abstract: In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.

Proceedings ArticleDOI
09 Jun 2013
TL;DR: Gaussian Mixture Model (GMM) identifies two effective features, namely Mel Frequency Cepstrum Coefficients (MFCCs) and Auto Correlation Function Coefficient (ACFC) extracted directly from speech signal, and GMM supervector formed by values of MFCCs, delta MF CCs and ACFC is used.
Abstract: In this paper, we study how speech features' numbers and statistical values impact recognition accuracy of emotions present in speech. With Gaussian Mixture Model (GMM), we identify two effective features, namely Mel Frequency Cepstrum Coefficients (MFCCs) and Auto Correlation Function Coefficients (ACFC) extracted directly from speech signal. Using GMM supervector formed by values of MFCCs, delta MFCCs and ACFC, we conduct experiments with Berlin emotional database considering six previously proposed emotions: anger, disgust, fear, happy, neutral and sad. Our method achieve emotion recognition rate of 74.45%, significantly better than 59.00% achieved previously. To prove the broad applicability of our method, we also conduct experiments considering a different set of emotions: anger, boredom, fear, happy, neutral and sad. Our emotion recognition rate of 75.00% is again better than71.00% of the method of hidden Markov model with MFCC, delta MFCC, cepstral coefficient and speech energy.

Proceedings ArticleDOI
Mitchell McLaren1, Victor Abrash1, Martin Graciarena1, Yun Lei1, Jan Pesan 
25 Aug 2013
TL;DR: It was found that robustness to compressed speech was marginally improved by exposing PLDA to noisy and reverberant speech, with little improvement using trancoded speech in PLDA based on codecs mismatched to the evaluation conditions.
Abstract: The goal of this paper is to analyze the impact of codecdegraded speech on a state-of-the-art speaker recognition system and propose mitigation techniques. Several acoustic features are analyzed, including the standard Mel filterbank cepstral coefficients (MFCC), as well as the noise-robust medium duration modulation cepstrum (MDMC) and power normalized cepstral coefficients (PNCC), to determine whether robustness to noise generalizes to audio compression. Using a speaker recognition system based on i-vectors and probabilistic linear discriminant analysis (PLDA), we compared four PLDA training scenarios. The first involves training PLDA on clean data, the second included additional noisy and reverberant speech, a third introduces transcoded data matched to the evaluation conditions and the fourth, using codec-degraded speech mismatched to the evaluation conditions. We found that robustness to compressed speech was marginally improved by exposing PLDA to noisy and reverberant speech, with little improvement using trancoded speech in PLDA based on codecs mismatched to the evaluation conditions. Noise-robust features offered a degree of robustness to compressed speech while more significant improvements occurred when PLDA had observed the codec matching the evaluation conditions. Finally, we tested i-vector fusion from the different features, which increased overall system performance but did not improve robustness to codec-degraded speech. Index Terms: speaker recognition, speech coding, codec degradation, speaker verification.

Journal ArticleDOI
TL;DR: A new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010 and employs a Gaussian mixture model instead of a vector quantizer in the phoneme recognition front-end is presented.
Abstract: We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010 Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals The goal of the procedure is to perceptually improve the sound of speech signals in background noise The proposed new method modifies Xiao's method in four significant ways Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures Results of subjective CMOS tests over a smaller set of test samples support our claims

Patent
11 Dec 2013
TL;DR: In this paper, a music similarity detection method based on mixed characteristics and a Gaussian mixed model is proposed, which comprises the steps of using a gamma-tone cepstrum coefficient for conducting similarity detection, and using weighting similarities of various characteristics as a final detection result.
Abstract: The invention discloses a music similarity detection method based on mixed characteristics and a Gaussian mixed model According to the basic thought, the method comprises the steps of using a gamma-tone cepstrum coefficient for conducting similarity detection, and using weighting similarities of various characteristics as a final detection result; providing a modulation spectrum characteristic based on a frame shaft, using the characteristic for representing a music long-time characteristic, and using the combination of the long-time characteristic and a short-time characteristic as the input of modeling in the next step; using the Gaussian mixed model for conducting modeling on the music characteristics, firstly, utilizing a dynamic K mean value method for conducting initialization on the model, then, using an expectation-maximization algorithm for conducting model training, obtaining accurate model parameters, and finally using a log-likelihood ratio algorithm for obtaining the similarities between the pieces of music According to the music similarity detection method, the obtaining of the music characteristics is more sufficient and thorough, the accuracy degree of music recommendation is improved, the characteristic vector dimensionality can be reduced, the information memory content of a music database is reduced, and the accuracy degree of the music recommendation is improved

Patent
11 Dec 2013
TL;DR: In this article, a method for suppressing transient noise in voice is characterized by comprising a gamma through frequency cepstrum coefficient extraction module, a transient noise detection module and a voice signal reconstruction module.
Abstract: The invention discloses a method for suppressing transient noise in voice, and belongs to the technical field of signal processing. The method for suppressing transient noise in voice is characterized by comprising a gamma through frequency cepstrum coefficient extraction module, a transient noise detection module and a voice signal reconstruction module, wherein the input end of the gamma through frequency cepstrum coefficient extraction module receives a voice signal containing noise, the output end of the gamma through frequency cepstrum coefficient extraction module is connected with the input end of the transient noise detection module, the output end of the transient noise detection module is connected with the input end of the voice signal reconstruction module, the input end of the voice signal reconstruction module receives the voice signal containing noise and is also connected with the output end of the transient noise detection module, and the voice signal reconstruction module outputs the voice with noise removed.

Journal ArticleDOI
TL;DR: This study compared the performance of linear, logarithmic, and Mel frequency scale TE-FB-CEPs using radial basis function neural network in general epileptic seizure detection and found that the composite vectors maintain excellent overall accuracy in all the eight classification problems.
Abstract: About 1–3% of the world population suffers from epilepsy. Epileptic seizures are abnormal sudden discharges in the brain with signatures manifesting in the electroencephalograph (EEG) recordings by frequency changes and increased amplitudes. These changes, in this work, are captured through static and dynamic features derived from three Teager energy based filter-bank cepstra (TE-FB-CEPs). We compared the performance of linear, logarithmic, and Mel frequency scale TE-FB-CEPs using radial basis function neural network in general epileptic seizure detection. The comparison is tried on eight different classification problems which encompass all the possible discriminations in the medical field related to epilepsy. In a previous study, using traditional cepstrum on the same database, we had found that the composite vectors showed a degraded performance in seizure detection. In this study, however, irrespective of frequency scaling used, it is found that the composite vectors of TE-FB-CEPs maintain excellent overall accuracy in all the eight classification problems.

Patent
20 Mar 2013
TL;DR: In this paper, an isolated word speech recognition method based on an HRSF (half raised sine function) and an improved DTW (Dynamic Time Warping) algorithm was proposed.
Abstract: The invention discloses an isolated word speech recognition method based on an HRSF (Half Raised Sine Function) and an improved DTW (Dynamic Time Warping) algorithm The isolated word speech recognition method comprises the following steps that (1), a received analog voice signal is preprocessed; preprocessing comprises pre-filtering, sampling, quantification, pre-emphasis, windowing, short-time energy analysis, short-time average zero crossing rate analysis and end-point detection; (2), a power spectrum X(n) of a frame signal is obtained by FFT (Fast Fourier Transform) and is converted into a power spectrum under a Mel frequency; an MFCC (Mel Frequency Cepstrum Coefficient) parameter is calculated; the calculated MFCC parameter is subjected to HRSF cepstrum raising after a first order difference and a second order difference are calculated; and (3), the improved DTW algorithm is adopted to match test templates with reference templates; and the reference template with the maximum matching score serves as an identification result According to the isolated word speech recognition method, the identification of a single Chinese character is achieved through the improved DTW algorithm, and the identification rate and the identification speed of the single Chinese character are increased

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a low-variance and adaptive-bandwidth spectral estimator for spectral subtraction, which is based on the two-stage spectral estimation (TSSE) and the modified cepstrum thresholding (MCT).

Patent
25 Dec 2013
TL;DR: Wang et al. as mentioned in this paper proposed a rolling bearing fault feature extraction method based on independent component analysis and cepstrum theory, which is easy to realize and good in real-time property.
Abstract: The invention provides a rolling bearing fault feature extraction method based on an independent component analysis and cepstrum theory. The rolling bearing fault feature extraction method comprises the steps of acquiring a vibration acceleration testing signal of a rolling bearing by using an acceleration sensor; decoupling and separating the vibration acceleration testing signal by using FastICA based on negentropy maximization; selecting a separated signal capable of representing fault feather information to the maximum extent; carrying out cepstrum analysis on the selected separated signal, and drawing a cepstrum chart; observing whether the cepstrum chart has a fault feature frequency or an obvious peak value at a frequency multiplication position, and furthermore, judging whether the rolling bearing has a fault. By using the rolling bearing fault feature extraction method, the feature information of a fault signal of the rolling bearing can be effectively recognized from a complex sideband signal, a periodical fault component in a sideband can be conveniently extracted, the fault information is remarkably enhanced, the fault diagnosis precision is greatly improved, the fault diagnosis time period is shortened, and the spectral analysis difficulty is simplified; in addition, the rolling bearing fault feature extraction method is easy to realize and good in real-time property.