scispace - formally typeset
Search or ask a question
Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.


Papers
More filters
Proceedings ArticleDOI
18 Mar 2005
TL;DR: It is shown that the proposed technique based features yield a significant increase in speech recognition performance in non-stationary noise conditions when compared directly to the MFCC and RASTA-PLP features.
Abstract: It is well known that the peaks in the spectrum of a log Mel-filter bank are important cues in characterizing speech sounds. However, low energy perturbations in the power spectrum may become numerically significant after the log compression. We show that even if the spectral peaks are kept constant, the low energy perturbations in the power spectrum can create huge variations in the cepstral coefficients. We show, both analytically and experimentally, that exponentiating the log Mel-filter bank spectrum before the cepstrum computation can significantly reduce the sensitivity of the cepstra to spurious low energy perturbations. The Mel-cepstrum modulation spectrum (Tyagi, V. et al., Proc. IEEE ASRU, 2003) is computed from the processed cepstra which results in further noise robustness of the composite feature vector. In experiments with speech signals, it is shown that the proposed technique based features yield a significant increase in speech recognition performance in non-stationary noise conditions when compared directly to the MFCC and RASTA-PLP features.

79 citations

Patent
TL;DR: In this paper, a method and system for transforming a sampling rate in speech recognition systems, in accordance with the present invention, includes the steps of providing cepstral based data including utterances comprised of segments at a reference frequency, the segments being represented by CEPstral vector coefficients, converting the cepSTral vector coefficient to energy bands in logarithmic spectra, filtering the energy bands of the log-a-thm spectra to remove energy bands having a frequency above a predetermined portion of a target frequency and converting the filtered energy bands to modified
Abstract: A method and system for transforming a sampling rate in speech recognition systems, in accordance with the present invention, includes the steps of providing cepstral based data including utterances comprised of segments at a reference frequency, the segments being represented by cepstral vector coefficients, converting the cepstral vector coefficients to energy bands in logarithmic spectra, filtering the energy bands of the logarithmic spectra to remove energy bands having a frequency above a predetermined portion of a target frequency and converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency. Another method and system convert system prototypes for speech recognition systems from a reference frequency to a target frequency.

78 citations

Proceedings ArticleDOI
18 Mar 2005
TL;DR: Experiments performed on the large-vocabulary task VerbMobil II (German conversational speech) show that the accuracy of automatic speech recognition systems can be improved by the combination of different acoustic features.
Abstract: In this paper, we consider the use of multiple acoustic features of the speech signal for robust speech recognition. We investigate the combination of various auditory based (mel frequency cepstrum coefficients, perceptual linear prediction, etc.) and articulatory based (voicedness) features. Features are combined by linear discriminant analysis and log-linear model combination based techniques. We describe the two feature combination techniques and compare the experimental results. Experiments performed on the large-vocabulary task VerbMobil II (German conversational speech) show that the accuracy of automatic speech recognition systems can be improved by the combination of different acoustic features.

78 citations

Proceedings ArticleDOI
06 Sep 2009
TL;DR: In this paper, complex cepstrum can be used for glottal flow estimation by separating the causal and anticausal components of a windowed speech signal as done by the Zeros of the Z-Transform (ZZT) decomposition.
Abstract: Homomorphic analysis is a well-known method for the separation of non-linearly combined signals More particularly, the use of complex cepstrum for source-tract deconvolution has been discussed in various articles However there exists no study which proposes a glottal flow estimation methodology based on cepstrum and reports effective results In this paper, we show that complex cepstrum can be effectively used for glottal flow estimation by separating the causal and anticausal components of a windowed speech signal as done by the Zeros of the Z-Transform (ZZT) decomposition Based on exactly the same principles presented for ZZT decomposition, windowing should be applied such that the windowed speech signals exhibit mixed-phase characteristics which conform the speech production model that the anticausal component is mainly due to the glottal flow open phase The advantage of the complex cepstrum-based approach compared to the ZZT decomposition is its much higher speed Index Terms: Speech Analysis, Homomorphic Processing, Glottal Source Estimation

78 citations

Proceedings Article
01 Jan 1997
TL;DR: A new method of formant analysis is described which includes techniques to overcome both of the above difficulties and shows that including formant features can offer increased accuracy over using cepstrum features only.
Abstract: Formant frequencies have rarely been used as acoustic features for speech recognition, in spite of their phonetic significance For some speech sounds one or more of the formants may be so badly defined that it is not useful to attempt a frequency measurement Also, it is often difficult to decide which formant labels to attach to particular spectral peaks This paper describes a new method of formant analysis which includes techniques to overcome both of the above difficulties Using the same data and HMM model structure, results are compared between a recognizer using conventional cepstrum features and one using three formant frequencies, combined with fewer cepstrum features to represent general spectral trends For the same total number of features, results show that including formant features can offer increased accuracy over using cepstrum features only

77 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Robustness (computer science)
94.7K papers, 1.6M citations
80% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Deep learning
79.8K papers, 2.1M citations
79% related
Support vector machine
73.6K papers, 1.7M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202386
2022206
202160
202096
2019135
2018130