scispace - formally typeset
Search or ask a question

Showing papers on "Cepstrum published in 2011"


Journal ArticleDOI
TL;DR: In this paper, two approaches are proposed to enhance the entry event while keeping the impulse response in order to enable a clear separation of the two events, and produce an averaged estimate of the size of the fault.

237 citations


Proceedings ArticleDOI
22 May 2011
TL;DR: A novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable is presented and initial results demonstrate phoneme recognition performance better than the current state-of-the-art for methods based on Mel cepstrum coefficients.
Abstract: State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, such low dimensional encodings may lose some relevant information and express other information in a way that makes it difficult to use for discrimination. Higher dimensional encodings could both improve performance in recognition tasks, and also be applied to speech synthesis by better modeling the statistical structure of the sound waves. In this paper we present a novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable and we report initial results demonstrating phoneme recognition performance better than the current state-of-the-art for methods based on Mel cepstrum coefficients.

223 citations


Journal ArticleDOI
TL;DR: In this article, a merit index is introduced that allows the automatic selection of the intrinsic mode functions that should be used for the calculation of the Hilbert-Huang spectrum of a spiral bevel gearbox.

184 citations


Proceedings ArticleDOI
22 May 2011
TL;DR: This work describes a modified feature-extraction procedure in which the time-difference operation is performed in the spectral domain, rather than the cepstral domain as is generally presently done, and finds the use of delta-spectral features improves the effective SNR for background music and white noise and recognition accuracy in reverberant environments is improved.
Abstract: Almost all current automatic speech recognition (ASR) systems conventionally append delta and double-delta cepstral features to static cepstral features. In this work we describe a modified feature-extraction procedure in which the time-difference operation is performed in the spectral domain, rather than the cepstral domain as is generally presently done. We argue that this approach based on “delta-spectral” features is needed because even though delta-cepstral features capture dynamic speech information and generally greatly improve ASR recognition accuracy, they are not robust to noise and reverberation. We support the validity of the delta-spectral approach both with observations about the modulation spectrum of speech and noise, and with objective experiments that document the benefit that the delta-spectral approach brings to a variety of currently popular feature extraction algorithms. We found that the use of delta-spectral features, rather than the more traditional delta-cepstral features, improves the effective SNR by between 5 and 8 dB for background music and white noise, and recognition accuracy in reverberant environments is improved as well.

124 citations


Journal ArticleDOI
TL;DR: This paper considers a fuzzy clustering approach for time series based on the estimated cepstrum, the spectrum of the logarithm of the spectral density function of a time series, which performs very well compared to when it is based on other features.

105 citations


Journal ArticleDOI
TL;DR: In this article, a low-pass cepstrum filter was used to reduce noise due to the random rough surface and the material properties of the target material to choose an optimal cutoff frequency for the filter, which was evaluated using laboratory measurements and Monte Carlo simulations for many sets of random surface realizations.
Abstract: The potential for terahertz (THz) spectroscopy to detect explosives and other materials of interest is complicated by rough surface scattering. Our previous work has demonstrated that by averaging over diffuse observation angles and surfaces, spectral features could be recovered from laboratory measurements and numerical computer simulations. In addition to averaging, a low-pass cepstrum filter was used to reduce noise due to the random rough surface. This paper expands on these concepts by using the cepstrum of both the random rough surface and the material properties of the target material to choose an optimal cutoff frequency for the filter. The utility of these techniques is evaluated using laboratory measurements and Monte Carlo simulations for many sets of random surface realizations. The Kirchhoff Approximation is used to quickly model diffuse scattering from dielectric materials with gradually undulating rough surfaces when the incident and diffuse scattering angles are near the surface normal. Th...

66 citations


Journal ArticleDOI
TL;DR: It is shown that the complex cepstrum causal-anticausal decomposition can be effectively used for glottal flow estimation when specific windowing criteria are met and has the potential to be used for voice quality analysis.

57 citations


Journal Article
TL;DR: In this article, the real cepstrum is used to localize and edit the log amplitude of the original signal, removing unwanted discrete frequency components, and then combines the edited amplitude with the original phase spectrum to return to the time domain.
Abstract: A new procedure is proposed that uses the real cepstrum to localize and edit the log amplitude of the original signal, removing unwanted discrete frequency components, and then combines the edited amplitude with the original phase spectrum to return to the time domain. This cepstral editing procedure (CEP) is used to remove discrete frequency components from signals measured on two machines with a faulty bearing, and then perform envelope analysis on the residual signal to diagnose the bearing fault. Signal processing used for condition monitoring purposes is usually concerned with separating various signal components from each other to identify changes in any one of them. This has to be done blind, since measured responses are a sum of components from a multitude of sources, and include deterministic (discrete frequency at constant speed), stationary random, and cyclostationary random components. The latter are typically produced by modulation of random signals by discrete frequencies and are often produced by rotating and reciprocating machines. A fundamental division is into discrete frequency and random components (both stationary and cyclostationary) for which a number of techniques have been developed over the years. This will normally separate gear from bearing signals, for example, since the former are deterministic and phase-locked to shaft speeds, while the latter can be treated as cyclostationary. As pointed out in Reference 1, the signals generated by local faults in rolling-element bearings are actually “pseudo-cyclostationary,” since the repetition frequency is affected by random slip and is not known exactly in advance, but the signals can still be treated as cyclostationary for envelope analysis. Signals produced by bearings with extended spalls are truly cyclostationary if the discrete carriers (such as gearmesh harmonics) are modulated at a fixed cyclic frequency (shaft speed for an inner race fault). The roughness of the spall surface introduces randomness in the modulation and allows separation from deterministic gear faults. Where the deterministic components are of primary interest, such as in gear diagnostics, the best separation method is undoubtedly time-synchronous averaging (TSA), 2,3 where a separate signal

45 citations


Book
26 May 2011
TL;DR: In this article, the Fourier power spectrum and spectrogram were assigned to the speech spectrum and the wavelet representation of the spectrograms and power spectra were used to estimate the speech signal.
Abstract: Introduction.- Historical perspective on speech spectrum analysis.-The Fourier power spectrum and spectrogram.- Other time-frequency and wavelet representations.- The new frontier: Reassigned spectrograms and power spectra.- Linear prediction of the speech spectrum.- Homomorphic analysis and the cepstrum.- Formant tracking methods.

41 citations


Proceedings ArticleDOI
22 May 2011
TL;DR: It is shown that modulation features are more robust against room reverberation than conventional cepstral and dynamic features and that they strongly benefit from a high early-to-late energy ratio of the characteristic RIR.
Abstract: In this contribution we present a feature extraction method that relies on the modulation-spectral analysis of amplitude fluctuations within sub-bands of the acoustic spectrum by a STFT. The experimental results indicate that the optimal temporal filter extension for amplitude modulation analysis is around 310 ms. It is also demonstrated that the phase information of the modulation spectrum contains important cues for speech recognition. In this context, the advantage of an odd analysis basis function is considered. The best presented features reached a total relative improvement of 53,5 % for clean-condition training on Aurora-2. Furthermore, it is shown that modulation features are more robust against room reverberation than conventional cepstral and dynamic features and that they strongly benefit from a high early-to-late energy ratio of the characteristic RIR.

39 citations


Proceedings ArticleDOI
14 May 2011
TL;DR: A viable scheme to identify insect sounds automatically is proposed by using sound parameterization techniques that dominate speaker recognition technology, and the test results proved the efficiency of the proposed method.
Abstract: This study aims to provide general technicians who manage pests in production with a convenient way to recognize insects. A viable scheme to identify insect sounds automatically is proposed by using sound parameterization techniques that dominate speaker recognition technology. The acoustic signal is preprocessed, segmented into a series of sound samples. Mel-frequency cepstrum coefficient(MFCC) is extracted from the sound sample as sound features, and probabilistic neural network(PNN) is trained with given features. The testing samples are classified by the PNN finally. The proposed method is evaluated in a database with acoustic samples of 50 different insect sounds. The recognition rate was above 96%. The test results proved the efficiency of the proposed method.

Proceedings ArticleDOI
22 May 2011
TL;DR: This paper presents a noise estimation technique based on knowledge of pitch information for robust speech recognition that is compared in Aurora-2 with other similar techniques like cepstral SS (Spectral Subtraction).
Abstract: This paper presents a noise estimation technique based on knowledge of pitch information for robust speech recognition. In the first stage the noise is estimated by means of extrapolating the noise from frames where speech is believed to be absent. These frames are detected with a proposed pitch based VAD (Voice Activity Detector). In the second stage the noise estimation is revised in voiced frames using harmonic tunnelling thechnique. The tunnelling noise estimation is used at high SNRs as an upper bound of the noise rather than a suitable estimation. A spectrogram MD (Missing Data) recognition system is chosen to evaluate the proposed noise estimation. The proposed system is compared in Aurora-2 with other similar techniques like cepstral SS (Spectral Subtraction).

Journal ArticleDOI
TL;DR: Experiments are presented suggesting that the combination of the SNR-cepstrum with the well known perceptual linear prediction method can be beneficial in noisy environments.

Proceedings ArticleDOI
24 Mar 2011
TL;DR: Perceptual linear predictive cepstrum yields the accuracy of 86% and 93% for speaker independent isolated digit recognition using VQ and combination of VQ & HMM speech models respectively.
Abstract: The main objective of this paper is to explore the effectiveness of perceptual features for performing isolated digits and continuous speech recognition. The proposed perceptual features are captured and code book indices are extracted. Expectation maximization algorithm is used to generate HMM models for the speeches. Speech recognition system is evaluated on clean test speeches and the experimental results reveal the performance of the proposed algorithm in recognizing isolated digits and continuous speeches based on maximum log likelihood value between test features and HMM models for each speech. Performance of these features is tested on speeches randomly chosen from “TI Digits_1”, “TI Digits_2” and “TIMIT” databases. This algorithm is tested for VQ and combination of VQ and HMM speech modeling techniques. Perceptual linear predictive cepstrum yields the accuracy of 86% and 93% for speaker independent isolated digit recognition using VQ and combination of VQ & HMM speech models respectively. This feature also gives 99% and 100% accuracy for speaker independent continuous speech recognition by using VQ and the combination of VQ & HMM speech modeling techniques.

Journal ArticleDOI
TL;DR: Experiments on NIST 2003 and 2007 LRE evaluation corpora show that TFC is more effective than SDC, and that the GMM-based BDHLDA results in lower equal error rate (EER) and minimum average cost (Cavg) than either TFC or SDC approaches.
Abstract: The shifted delta cepstrum (SDC) is a widely used feature extraction for language recognition (LRE). With a high context width due to incorporation of multiple frames, SDC outperforms traditional delta and acceleration feature vectors. However, it also introduces correlation into the concatenated feature vector, which increases redundancy and may degrade the performance of backend classifiers. In this paper, we first propose a time-frequency cepstral (TFC) feature vector, which is obtained by performing a temporal discrete cosine transform (DCT) on the cepstrum matrix and selecting the transformed elements in a zigzag scan order. Beyond this, we increase discriminability through a heteroscedastic linear discriminant analysis (HLDA) on the full cepstrum matrix. By utilizing block diagonal matrix constraints, the large HLDA problem is then reduced to several smaller HLDA problems, creating a block diagonal HLDA (BDHLDA) algorithm which has much lower computational complexity. The BDHLDA method is finally extended to the GMM domain, using the simpler TFC features during re-estimation to provide significantly improved computation speed. Experiments on NIST 2003 and 2007 LRE evaluation corpora show that TFC is more effective than SDC, and that the GMM-based BDHLDA results in lower equal error rate (EER) and minimum average cost (Cavg) than either TFC or SDC approaches.

Journal ArticleDOI
TL;DR: A background accompaniment removal approach by exploiting the underlying relationships between solo singing voices and their accompanied versions in cepstrum improves the SID accuracy significantly; even when a test music recording involves sung language not covered in the data for estimating the transformation.
Abstract: One major challenge of identifying singers in popular music recordings lies in how to reduce the interference of background accompaniment in trying to characterize the singer voice. Although a number of studies on automatic Singer IDentification (SID) from acoustic features have been reported, most systems to date, however, do not explicitly deal with the background accompaniment. This study proposes a background accompaniment removal approach for SID by exploiting the underlying relationships between solo singing voices and their accompanied versions in cepstrum. The relationships are characterized by a transformation estimated using a large set of accompanied singing generated by manually mixing solo singing with the accompaniments extracted from Karaoke VCDs. Such a transformation reflects the cepstrum variations of a singing voice before and after it is added with accompaniments. When an unknown accompanied voice is presented to our system, the transformation is performed to convert the cepstrum of the accompanied voice into a solo-voice-like one. Our experiments show that such a background removal approach improves the SID accuracy significantly; even when a test music recording involves sung language not covered in the data for estimating the transformation.

Proceedings ArticleDOI
03 Nov 2011
TL;DR: Considering the importance of cepstrum in speech recognition, weighted window MFCC (or SMFCC), performs better than MFCC which based on the amplitude spectrum extracted, and can improve speech recognition rate effectively.
Abstract: Everyone have their own speaking pitch frequency, while the research of the traditional MFCC ignoring the impact of pitch frequency, reducing the accuracy of the voice signal description, and thus affect the performance of speech recognition. MFCC which based on the speaking frequency envelope extracted (referred to as SMFCC), performs better than MFCC which based on the amplitude spectrum extracted; And consider the importance of cepstrum in speech recognition, weighted window MFCC (or SMFCC) can improve speech recognition rate effectively.

Journal ArticleDOI
TL;DR: Noise compensation can be applied successfully to prediction with best performance given by a model adaptation method that performs only slightly worse than matched training and testing, and human listening tests show that the predicted features are sufficient for speech reconstruction and that noise compensation improves speech quality in noisy conditions.
Abstract: This paper examines the effect of applying noise compensation to acoustic speech feature prediction from noisy mel-frequency cepstral coefficient (MFCC) vectors within a distributed speech recognition architecture. An acoustic speech feature (comprising fundamental frequency, formant frequencies, speech/nonspeech classification, and voicing classification) is predicted from an MFCC vector in a maximum a posteriori (MAP) framework using phoneme-specific or global models of speech. The effect of noise is considered and three different noise compensation methods, that have been successful in robust speech recognition, are integrated within the MAP framework. Experiments show that noise compensation can be applied successfully to prediction with best performance given by a model adaptation method that performs only slightly worse than matched training and testing. Further experiments consider application of the predicted acoustic features to speech reconstruction. A series of human listening tests show that the predicted features are sufficient for speech reconstruction and that noise compensation improves speech quality in noisy conditions.

Proceedings Article
01 Jan 2011
TL;DR: An advanced uniform speech parameterization scheme for statistical model segments and waveform segments employed in the multi-form segment synthesis system is introduced and a new adaptive enhancement technique for model segments is presented that reduces the perceived gap in quality and similarity between model and template segments.
Abstract: In multi-form segment synthesis speech is constructed by sequencing speech segments of different nature: model segments, i.e. mathematical abstractions of speech and template segments, i.e. speech waveform fragments. These multi-form segments can have shared, layered or alternate speech parameterization schemes. This paper introduces an advanced uniform speech parameterization scheme for statistical model segments and waveform segments employed in our multi-form segment synthesis system. Mel-Regularized Cepstrum derived from amplitude and phase spectra forms its basic framework. Furthermore, a new adaptive enhancement technique for model segments is presented that reduces the perceived gap in quality and similarity between model and template segments.

Book
09 Aug 2011
TL;DR: This brief book offers a comprehensive description, comparative analysis, and empirical performance evaluation of eleven contemporary speech parameterization methods, which compute short-time cepstrum-based speech features.
Abstract: This brief book offers a general view of short-time cepstrum-based speech parameterization and provides a common ground for further in-depth studies on the subject. Specifically, it offers a comprehensive description, comparative analysis, and empirical performance evaluation of eleven contemporary speech parameterization methods, which compute short-time cepstrum-based speech features. Among these are five discrete wavelet packet transform (DWPT)-based and six discrete Fourier transform (DFT)-based speech features and some of their variants which have been used on the speech recognition, speaker recognition, and other related speech processing tasks. The main similarities and differences in their computation are discussed and empirical results from performance evaluation in common experimental conditions are presented. The recognition accuracy obtained on the monophone recognition, continuous speech recognition, and speaker recognition tasks is contrasted against the one obtained for the well-known and widely used Mel Frequency Cepstral Coefficients (MFCC). It is shown that many of these methods lead to speech features that do offer competitive performance on a certain speech processing setup when compared to the venerable MFCC. The last does not target the promotion of certain speech features but instead aims to enhance the common understanding about the advantages and disadvantages of the various speech parameterization techniques available today and to provide the basis for selection of an appropriate speech parameterization in each particular case. In brief, this volume consists of nine sections. Section 1 summarizes the main concepts on which the contemporary speech parameterization is based and offers some background information about their origins. Section 2 introduces the objectives of speech pre-processing and describes the processing steps that are commonly used in the contemporary speech parameterization methods. Sections 3 and 4 offer a comprehensive description and a comparative analysis of the DFT- and DWPT-based speech parameterization methods of interest. Sections 5–7, present results from experimental evaluation on the monophone recognition, continuous speech recognition, and speaker recognition tasks, respectively. 8 offers concluding remarks and outlook for possible future targets of speech parameterization research. Finally, Sect. 9 provides some links to other sources of information and to publically available software, which offer ready-to-use implementations of these speech features.

Proceedings ArticleDOI
22 May 2011
TL;DR: This paper proposes the use of acoustic voice source features extracted directly from the speech spectrum (or cepstrum) for cognitive load classification and proposes pre- and post-processing techniques to improve the estimation of the cepstral peak prominence (CPP).
Abstract: Previous work in speech-based cognitive load classification has shown that the glottal source contains important information for cognitive load discrimination. However, the reliability of glottal flow features depends on the accuracy of the glottal flow estimation, which is a non-trivial process. In this paper, we propose the use of acoustic voice source features extracted directly from the speech spectrum (or cepstrum) for cognitive load classification. We also propose pre-and post-processing techniques to improve the estimation of the cepstral peak prominence (CPP). 3-class classification results on two databases showed CPP as a promising cognitive load classification feature that outperforms glottal flow features. Score-level fusion of the CPP-based classification system with a formant frequency-based system yielded a final improved accuracy of 62.7%, suggesting that CPP contains useful voice source information that complements the information captured by vocal tract features.

Journal ArticleDOI
TL;DR: The paper presents two methods of noise reduction of speech signal recorded in an MRI device during phonation for the human vocal tract modelling using real cepstrum limitation and clipping the "peaks" corresponding to the harmonic frequencies of mechanical noise.
Abstract: The paper presents two methods of noise reduction of speech signal recorded in an MRI device during phonation for the human vocal tract modelling. The applied approach of noise speech signal cleaning is based on cepstral speech analysis and synthesis because the noise is mainly produced by gradient coils, has a mechanical character, and can be processed in spectral domain. Our first noise reduction method is using real cepstrum limitation and clipping the “peaks” corresponding to the harmonic frequencies of mechanical noise. The second method is coming out from substation of the short-time spectra of two signals recorded withal: the first includes speech and noise, and the second consists of noise only. The resulting speech quality was compared by spectrogram and mean periodogram methods.

Proceedings ArticleDOI
12 Feb 2011
TL;DR: A gender classification system is proposed based on pitch, formants and combination of both and a feature vector consisting of pitches derived from all the above mentioned pitch determination methods was also used for gender classification.
Abstract: A gender classification system is proposed based on pitch, formants and combination of both. Ten Hindi digits database has been prepared for fifty speakers. Each Speaker has spoken each digit ten times. Formants derived from speech samples have been used for gender classification. Gender classification has been also done by using pitch extracted from different methods. Autocorrelation, Cepstrum and Average Magnitude Difference (AMDF) methods have been used for pitch determination from speech samples. Formants in combination with pitch are also used for gender classification. A feature vector consisting of pitches derived from all the above mentioned pitch determination methods was also used for gender classification. Experiments were performed for both open-set and closed-set gender classification. Autocorrelation method performed best for gender classification in open-set. Hybrid method (Autocorrelation +AMDF+ Cepstrum) performed best for gender classification in closed-set.

Patent
13 Jul 2011
TL;DR: In this paper, a speech error detection method by front-end processing using an artificial neural network (ANN) is presented. But the method comprises the following steps: extracting new 64-dimensional features with strong pattern recognition capability and good discrimination property from 39-dimensional mel-frequency cepstrum coefficient (MFCC) parameters by utilizing a multilayer perceptron (MLP); performing speech recognition on test data by a machine, and generating a goodness of pronunciation (GOP) score; and pointing out pronunciation errors and error degree according to a set threshold, and performing directed-
Abstract: The invention provides a speech error detection method by front-end processing using an artificial neural network (ANN). The method comprises the following steps: extracting new 64-dimensional features with strong pattern recognition capability and good discrimination property from 39-dimensional mel-frequency cepstrum coefficient (MFCC) parameters by utilizing a multilayer perceptron (MLP); performing speech recognition on test data by a machine, and generating a goodness of pronunciation (GOP) score; and pointing out pronunciation errors and error degree according to a set threshold, and performing directed-learning for pronunciation errors.

Proceedings ArticleDOI
22 May 2011
TL;DR: A model for estimating TVs trained on natural speech and a Dynamic Bayesian Network (DBN) based speech recognition architecture that treats vocal tract constriction gestures as hidden variables are proposed, eliminating the necessity for explicit gesture recognition.
Abstract: Previously we have proposed different models for estimating articulatory gestures and vocal tract variable (TV) trajectories from synthetic speech. We have shown that when deployed on natural speech, such models can help to improve the noise robustness of a hidden Markov model (HMM) based speech recognition system. In this paper we propose a model for estimating TVs trained on natural speech and present a Dynamic Bayesian Network (DBN) based speech recognition architecture that treats vocal tract constriction gestures as hidden variables, eliminating the necessity for explicit gesture recognition. Using the proposed architecture we performed a word recognition task for the noisy data of Aurora-2. Significant improvement was observed in using the gestural information as hidden variables in a DBN architecture over using only the mel-frequency cepstral coefficient based HMM or DBN backend. We also compare our results with other noise-robust front ends.

Book ChapterDOI
01 Jan 2011
TL;DR: The paper shows how the cepstrum technique can be used to remove discrete frequency components from signals measured on two machines with a faulty bearing, and then perform envelope analysis on the residual signal to diagnose the bearing fault.
Abstract: In machine diagnostics there are a number of tools for separating discrete frequency components from random and cyclostationary components. This is the basis of separating gear (deterministic) from bearing (second order cyclostationary) signals for example. Time synchronous averaging (TSA) requires a separate operation, including resampling, to be carried out for each periodic frequency, and the method cannot be used for discrete frequency sidebands, or partial bandwidth spectra. Self adaptive noise cancellation (SANC) and discrete/random separation (DRS) remove all discrete frequencies, whether harmonics or sidebands, and it is not possible to decide if some should be left. The method proposed here uses the cepstrum to localise discrete frequency components, which manifest themselves as harmonic or sideband families. Selected families can be removed in the cepstrum, leaving any it might be desirable to retain, and generating a notch filter that is flexible enough to allow for small speed fluctuations, or even narrow band noise peaks that sometimes result from slight random modulation of periodic signals. Normally, to edit the cepstrum and return to the time domain, it is necessary to use the complex cepstrum, but the latter requires the phase signal to be unwrapped. This is not possible for response signals containing discrete frequencies and noise, where the phase is not continuous. The procedure proposed here uses the real cepstrum to localise and edit the log amplitude of the original signal, removing the unwanted discrete frequency components, and then combines the edited amplitude with the original phase spectrum to return to the time domain. The paper shows how this technique can be used to remove discrete frequency components from signals measured on two machines with a faulty bearing, and then perform envelope analysis on the residual signal to diagnose the bearing fault. One is a gear test rig for which the discrete frequencies are harmonics of the shaft speed and gearmesh frequency, and the other a bladed disc test rig for which the discrete frequencies are harmonics of the shaft speed and bladepass frequency. Envelope analysis can be done on both full bandwidth and partial (zoom) bandwidth signals, the latter to save on computation, and restrict the amount of discrete frequency components to be removed, since the signal envelope is independent of frequency shifts.

Proceedings Article
01 Aug 2011
TL;DR: The proposed algorithm employs neither a voiced/unvoiced detection nor a fundamental period estimator and is shown to outperform an algorithm without cepstral processing in terms of a higher signal-to-interference ratio, a lower bark spectral distortion, and a lower log kurtosis ratio, indicating a reduction of musical noise.
Abstract: We present an effective way to reduce musical noise in binaural speech dereverberation algorithms based on an instantaneous weighting of the cepstrum. We propose this instantaneous technique, as temporal smoothing techniques result in a smearing of the signal over time and are thus expected to reduce the dereverberation performance. For the instantaneous weighting function we compute the a posteriori probability that a cepstral coefficient represents the speech spectral structure. The proposed algorithm incorporates a priori knowledge about the speech spectral structure by training the parameters of the respective likelihood function offline using a speech database. The proposed algorithm employs neither a voiced/unvoiced detection nor a fundamental period estimator and is shown to outperform an algorithm without cepstral processing in terms of a higher signal-to-interference ratio, a lower bark spectral distortion, and a lower log kurtosis ratio, indicating a reduction of musical noise.

Journal ArticleDOI
TL;DR: The methodology for quality estimation of speech features is presented and the most proper metric was chosen in combination with Dynamic Time Warping (DTW) classifier.
Abstract: The best feature set selection is the key of successful speech recognition system. Quality measure is needed to characterize the chosen feature set. Variety of feature quality metrics are proposed by other authors. However, no guidance is given to choose the appropriate metric. Also no metrics investigations for speech features were made. In the paper the methodology for quality estimation of speech features is presented. Metrics have to be chosen on the ground of their correlation with classification results. Linear Frequency Cepstrum (LFCC), Mel Frequency Cepstrum (MFCC), Perceptual Linear Prediction (PLP) analyses were selected for experiment. The most proper metric was chosen in combination with Dynamic Time Warping (DTW) classifier. Experimental investigation results are presented. Ill. 5, bibl. 18, tabl. 3 (in English; abstracts in English and Lithuanian). http://dx.doi.org/10.5755/j01.eee.110.4.302

Proceedings ArticleDOI
14 Jun 2011
TL;DR: A novel estimation method of the parameters of motion blur is proposed, which uses the difference of Gaussian as the input of the cepstrum transform, without additional transform such as Hough or Radon transform.
Abstract: In this paper, we propose a novel estimation method of the parameters of motion blur, which is caused by the camera or object motion during the image and video capture. In the proposed method, the motion blur is characterized by the blur orientation and length in cepstrum domain. To a better estimation of orientation and length of the motion blur in the cepstrum domain, we use the difference of Gaussian as the input of the cepstrum transform, without additional transform such as Hough or Radon transform. We propose a simple and effective method, which separates blur components from image components. Simulations with various sets of simulated and real motion blurred images show the effectiveness of the proposed algorithm in terms of the quality. The proposed algorithm is computationally efficient and easy to use.

Book ChapterDOI
07 Nov 2011
TL;DR: Low-variance multi-taper spectrum estimation methods to compute the mel-frequency cepstral coefficient (MFCC) features for robust speech recognition perform better compared to the Hamming-windowed spectrum estimation method.
Abstract: In this paper we study low-variance multi-taper spectrum estimation methods to compute the mel-frequency cepstral coefficient (MFCC) features for robust speech recognition. In speech recognition, MFCC features are usually computed from a Hamming-windowed DFT spectrum. Although windowing helps in reducing the bias of the spectrum, but variance remains high. Multitaper spectrum estimation methods can be used to correct the shortcomings of single taper (or window) spectrum estimation methods. Experimental results on the AURORA-2 corpus show that the multi-taper methods, specifically the multi-peak multi-taper method, perform better compared to the Hamming-windowed spectrum estimation method.