scispace - formally typeset
Search or ask a question
Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.


Papers
More filters
Proceedings ArticleDOI
30 Nov 2003
TL;DR: These new dynamic features derived from the modulation spectrum of the cepstral trajectories of the speech signal yield a significant increase in the speech recognition performance in various noise conditions when compared directly to the standard temporal derivative features and C-JRASTA PLP features.
Abstract: In this paper, we present new dynamic features derived from the modulation spectrum of the cepstral trajectories of the speech signal. Cepstral trajectories are projected over the basis of sines and cosines yielding the cepstral modulation frequency response of the speech signal. We show that the different sines and cosines basis vectors select different modulation frequencies, whereas the frequency responses of the delta and the double delta filters are only centered over 15 Hz. Therefore, projecting cepstral trajectories over the basis of sines and cosines yield a more complementary and discriminative range of features. In this work, the cepstrum reconstructed from the lower cepstral modulation frequency components is used as the static feature. In experiments, it is shown that, as well as providing an improvement in clean conditions, these new dynamic features yield a significant increase in the speech recognition performance in various noise conditions when compared directly to the standard temporal derivative features and C-JRASTA PLP features.

56 citations

Journal ArticleDOI
O. Fujimura1
TL;DR: Experimental tests have been made on a computer-simulated channel vocoder to see whether pitch perturbation can be effectively simulated by partially replacing voiced excitation by random noise, in appropriate frequency-time portions and shows that partial devoicing of the high-frequency ranges definitely improves speech quality.
Abstract: Aperiodicity in voiced segments of speech may be ascribed to different causes. The magnitude of pitch perturbation is different in different spectral ranges of the signal. To see whether pitch perturbation can be effectively simulated by partially replacing voiced excitation by random noise, in appropriate frequency-time portions, experimental tests have been made on a computer-simulated channel vocoder. The buzz-hiss decision was made separately for three different frequency portions of the signal. The cepstrum technique was used for pitch detection, and separate buzz-hiss switching decisions were made at the synthesizer for each frequency portion. The switching thresholds were controlled, and deliberately "devoiced" versions were compared with regular vocoded speech. The fundamental frequency was determined by the lowband cepstrum. The result shows that partial devoicing of the high-frequency ranges definitely improves speech quality. Further, a comparatively large amount of devoicing is perceptually tolerable.

55 citations

Journal ArticleDOI
TL;DR: In this article, a hybrid approach for fault diagnosis of planetary bearing using an internal vibration sensor and novel signal processing strategies is presented, where an accelerometer is mounted internally on the planet carrier to address the issues of variable transmission path and adverse effect of the electromagnetic interference in the signal due to the use of a slip ring is tackled by optimizing the spectral kurtosis (SK) technique for demodulation band selection.

55 citations

Patent
22 Oct 1998
TL;DR: In this paper, a speech recognition method for recognizing an input speech in a noisy environment by using a plurality of clean speech models is provided, where each clean speech model has a clean speech feature parameter S representing a cepstrum parameter of a clean speaker.
Abstract: A speech recognition method of recognizing an input speech in a noisy environment by using a plurality of clean speech models is provided. Each of the clean speech models has a clean speech feature parameter S representing a cepstrum parameter of a clean speech thereof. The speech recognition method has the processes of: detecting a noise feature parameter N representing a cepstrum parameter of a noise in the noisy environment, immediately before the input speech is input; detecting an input speech feature parameter X representing a cepstrum parameter of the input speech in the noisy environment; calculating a modified clean speech feature parameter Y according to a following equation: Y = k · S + (1-k) · N (0 < k ≦ 1), where the "k" is a predetermined value corresponding to a signal-to-noise ratio in the noise environment; comparing the input speech feature parameter X with the modified clean speech feature parameter Y; and recognizing the input speech by repeatedly carrying out the calculating process and the comparing process with respect to the plurality of clean speech models.

54 citations

01 Jan 2010
TL;DR: This study investigates the characterization of subband energy as a two dimensional feature, comprising Spectral Centroid Magnitude (SCM) and SCF, and provides an SCF implementation that improves on the speaker recognition performance of both subband spectral centroid and FM features.
Abstract: Most conventional features used in speaker recognition are based on spectral envelope characterizations such as Mel-scale filterbank cepstrum coefficients (MFCC), Linear Prediction Cepstrum Coefficient (LPCC) and Perceptual Linear Prediction (PLP). The MFCC’s success has seen it become a de facto standard feature for speaker recognition. Alternative features, that convey information other than the average subband energy, have been proposed, such as frequency modulation (FM) and subband spectral centroid features. In this study, we investigate the characterization of subband energy as a two dimensional feature, comprising Spectral Centroid Magnitude (SCM) and Spectral Centroid Frequency (SCF). Empirical experiments carried out on the NIST 2001 and NIST 2006 databases using SCF, SCM and their fusion suggests that the combination of SCM and SCF are somewhat more accurate compared with conventional MFCC, and that both fuse effectively with MFCCs. We also show that frame-averaged FM features are essentially centroid features, and provide an SCF implementation that improves on the speaker recognition performance of both subband spectral centroid and FM features.

54 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Robustness (computer science)
94.7K papers, 1.6M citations
80% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Deep learning
79.8K papers, 2.1M citations
79% related
Support vector machine
73.6K papers, 1.7M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202386
2022206
202160
202096
2019135
2018130