Topic
Cepstrum
About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.
Papers published on a yearly basis
Papers
More filters
••
30 Nov 2003
TL;DR: These new dynamic features derived from the modulation spectrum of the cepstral trajectories of the speech signal yield a significant increase in the speech recognition performance in various noise conditions when compared directly to the standard temporal derivative features and C-JRASTA PLP features.
Abstract: In this paper, we present new dynamic features derived from the modulation spectrum of the cepstral trajectories of the speech signal. Cepstral trajectories are projected over the basis of sines and cosines yielding the cepstral modulation frequency response of the speech signal. We show that the different sines and cosines basis vectors select different modulation frequencies, whereas the frequency responses of the delta and the double delta filters are only centered over 15 Hz. Therefore, projecting cepstral trajectories over the basis of sines and cosines yield a more complementary and discriminative range of features. In this work, the cepstrum reconstructed from the lower cepstral modulation frequency components is used as the static feature. In experiments, it is shown that, as well as providing an improvement in clean conditions, these new dynamic features yield a significant increase in the speech recognition performance in various noise conditions when compared directly to the standard temporal derivative features and C-JRASTA PLP features.
56 citations
••
TL;DR: Experimental tests have been made on a computer-simulated channel vocoder to see whether pitch perturbation can be effectively simulated by partially replacing voiced excitation by random noise, in appropriate frequency-time portions and shows that partial devoicing of the high-frequency ranges definitely improves speech quality.
Abstract: Aperiodicity in voiced segments of speech may be ascribed to different causes. The magnitude of pitch perturbation is different in different spectral ranges of the signal. To see whether pitch perturbation can be effectively simulated by partially replacing voiced excitation by random noise, in appropriate frequency-time portions, experimental tests have been made on a computer-simulated channel vocoder. The buzz-hiss decision was made separately for three different frequency portions of the signal. The cepstrum technique was used for pitch detection, and separate buzz-hiss switching decisions were made at the synthesizer for each frequency portion. The switching thresholds were controlled, and deliberately "devoiced" versions were compared with regular vocoded speech. The fundamental frequency was determined by the lowband cepstrum. The result shows that partial devoicing of the high-frequency ranges definitely improves speech quality. Further, a comparatively large amount of devoicing is perceptually tolerable.
55 citations
••
TL;DR: In this article, a hybrid approach for fault diagnosis of planetary bearing using an internal vibration sensor and novel signal processing strategies is presented, where an accelerometer is mounted internally on the planet carrier to address the issues of variable transmission path and adverse effect of the electromagnetic interference in the signal due to the use of a slip ring is tackled by optimizing the spectral kurtosis (SK) technique for demodulation band selection.
55 citations
•
22 Oct 1998
TL;DR: In this paper, a speech recognition method for recognizing an input speech in a noisy environment by using a plurality of clean speech models is provided, where each clean speech model has a clean speech feature parameter S representing a cepstrum parameter of a clean speaker.
Abstract: A speech recognition method of recognizing an input speech in
a noisy environment by using a plurality of clean speech models is
provided. Each of the clean speech models has a clean speech
feature parameter S representing a cepstrum parameter of a clean
speech thereof. The speech recognition method has the processes of:
detecting a noise feature parameter N representing a cepstrum
parameter of a noise in the noisy environment, immediately before the
input speech is input; detecting an input speech feature parameter X
representing a cepstrum parameter of the input speech in the noisy
environment; calculating a modified clean speech feature parameter Y
according to a following equation:
Y = k · S + (1-k) · N (0 < k ≦ 1),
where the "k" is a predetermined value corresponding to a signal-to-noise
ratio in the noise environment; comparing the input speech
feature parameter X with the modified clean speech feature parameter
Y; and recognizing the input speech by repeatedly carrying out the
calculating process and the comparing process with respect to the
plurality of clean speech models.
54 citations
01 Jan 2010
TL;DR: This study investigates the characterization of subband energy as a two dimensional feature, comprising Spectral Centroid Magnitude (SCM) and SCF, and provides an SCF implementation that improves on the speaker recognition performance of both subband spectral centroid and FM features.
Abstract: Most conventional features used in speaker recognition are based on spectral envelope characterizations such as Mel-scale filterbank cepstrum coefficients (MFCC), Linear Prediction Cepstrum Coefficient (LPCC) and Perceptual Linear Prediction (PLP). The MFCC’s success has seen it become a de facto standard feature for speaker recognition. Alternative features, that convey information other than the average subband energy, have been proposed, such as frequency modulation (FM) and subband spectral centroid features. In this study, we investigate the characterization of subband energy as a two dimensional feature, comprising Spectral Centroid Magnitude (SCM) and Spectral Centroid Frequency (SCF). Empirical experiments carried out on the NIST 2001 and NIST 2006 databases using SCF, SCM and their fusion suggests that the combination of SCM and SCF are somewhat more accurate compared with conventional MFCC, and that both fuse effectively with MFCCs. We also show that frame-averaged FM features are essentially centroid features, and provide an SCF implementation that improves on the speaker recognition performance of both subband spectral centroid and FM features.
54 citations