scispace - formally typeset
Search or ask a question
Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.


Papers
More filters
DOI
01 Jan 1997
TL;DR: This work studies the processing of speech in the temporal-feature or modulation spectrum domain, aiming for alleviation of the effects of such disturbances, and analytically derive and discusses some properties and merits of temporal processing for speech signals.
Abstract: The performance of speech communication systems often degrades under realistic environmental conditions. Adverse environmental factors include additive noise sources, room reverberation, and transmission channel distortions. This work studies the processing of speech in the temporal-feature or modulation spectrum domain, aiming for alleviation of the effects of such disturbances. Speech reflects the geometry of the vocal organs, and the linguistically dominant component is in the shape of the vocal tract. At any given point in time, the shape of the vocal tract is reflected in the short-time spectral envelope of the speech signal. The rate of change of the vocal tract shape appears to be important for the identification of linguistic components. This rate of change, or the rate of change of the short-time spectral envelope can be described by the modulation spectrum, i.e. the spectrum of the time trajectories described by the short-time spectral envelope. For a wide range of frequency bands, the modulation spectrum of speech exhibits a maximum at about 4 Hz, the average syllabic rate. Disturbances often have modulation frequency components outside the speech range, and could in principle be attenuated without significantly affecting the range with relevant linguistic information. Early efforts for exploiting the modulation spectrum domain (temporal processing), such as the dynamic cepstrum or the RASTA processing, used ad hoc designed processing and appear to be suboptimal. As a major contribution, in this dissertation we aim for a systematic data-driven design of temporal processing. First we analytically derive and discuss some properties and merits of temporal processing for speech signals. We attempt to formalize the concept and provide a theoretical background which has been lacking in the field. In the experimental part we apply temporal processing to a number of problems including adaptive noise reduction in cellular telephone environments, reduction of reverberation for speech enhancement, and improvements on automatic recognition of speech degraded by linear distortions and reverberation.

34 citations

Journal ArticleDOI
TL;DR: It is demonstrated that the presence of a cepstral peak depends on the form of the probability density function (pdf) of the separation between reflectors, and in the case where the pdf is uniform from O to SM, the cepStral peak is found to occur at the quefrency corresponding to SM.

34 citations

Journal ArticleDOI
TL;DR: In this article, the authors compared two approaches for varying speed machines and found that the simpler approach gave better results for simulated gear signals with speed variations of both ±5% and ±15%.

34 citations

PatentDOI
Joji Kane1, Akira Nohara1
TL;DR: In this article, a signal detection device consisting of cepstrum calculating means (71, 75, 81), peak detection means (72, 76, 82), analysis interval setting means (73, 78, 84), enabling the setting of an optimum analysis interval voice detection means, to which the peak detected output is supplied, for detecting voice.
Abstract: A signal detection device comprises cepstrum calculating means (71, 75, 81), peak detection means (72, 76, 82) for detecting peak of the cepstrum; analysis interval setting means (73, 78, 84) enabling the setting of an optimum analysis interval voice detection means (74, 714, 83) to which the peak detected output is supplied, for detecting voice, wherein the peak detection interval of said peak detection means (72, 76, 82) is controlled by the set output from said analysis interval setting means (73, 78, 84).

34 citations

Proceedings Article
01 Jan 2001
TL;DR: This paper compares Root-cepstrum to Mel-Frequency cepstrum Coefficients (MFCC) in terms of their noise immunity during modeling and decoding speed and observes that for 84% of the phonemes, the average distance to all other acoustic units is increased in the Root-CEpstrums domain compared to MFCC resulting in a sharp acoustic model set.
Abstract: Root-cepstral analysis has been proposed previously for speech recognition in car environments [9]. In this paper, we focus on an alternative aspect of Root-cepstrum as it applies to discriminative acoustic modeling and fast speech recognizer decoding. We compare Root-cepstrum to Mel-Frequency cepstrum Coefficients (MFCC) in terms of their noise immunity during modeling and decoding speed. Our experiments use the SPINE [5] corpus which is composed of clean and noisy data with a 5K vocabulary size. Experiments were performed that allow pair-wise comparisons of acoustic models across different feature sets and acoustic units. We observed that for 84% of the phonemes, the average distance to all other acoustic units is increased in the Root-cepstrum domain compared to MFCC resulting in a sharp acoustic model set. Therefore, the ambiguity in the Root-cepstrum space is reduced. Large vocabulary noisy speech recognition experiments showed a 27.5% reduction in real–time processing factor (RTF) compared to MFCC features while improving overall recognition accuracy.

34 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Robustness (computer science)
94.7K papers, 1.6M citations
80% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Deep learning
79.8K papers, 2.1M citations
79% related
Support vector machine
73.6K papers, 1.7M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202386
2022206
202160
202096
2019135
2018130