scispace - formally typeset
Search or ask a question
Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.


Papers
More filters
Journal Article
TL;DR: A novel register array based low power FFT processor for Mel Frequency Cepstral Coefficient (MFCC) is proposed, which can reduce more power consumption and is very attractive for the speech feature extraction of MFCC.
Abstract: Fast Fourier Transform (FFT) plays an important role in the field of digital signal processing. High performance FFT processors are widely used in different application, such as speech processing, image processing, and communication system. In this paper, we proposed a novel register array based low power FFT processor for Mel Frequency Cepstral Coefficient (MFCC). Compared with [9-12], this novel architecture can reduce more power consumption. This approach is very attractive for the speech feature extraction of MFCC.

12 citations

Proceedings ArticleDOI
14 Apr 1983
TL;DR: The results of the simulation indicate that performance estimates from recognition experiments should be allowed wide error tolerances, and they illustrate the danger of trying too many features on the same database.
Abstract: Experiments are described in automatic, text-independent speaker recognition using three databases: good quality read speech, conversations over simulated telephone links, and conversations over real telephone links. A recognition system is evaluated on this material using a set of features which were believed to have some resistance to transmission degradations, namely, F 0 statistics and statistics of low-order cepstrum coefficient variation. Performance is reasonable on the first two databases but poor on the telephone speech. A new set of features based on the frequencies of peaks in the short-term smoothed spectrum is found to perform better on the telephone speech, presumably because of its greater resistance to noise and nonlinear distortions. A computer simulation of the recognition experiments is described. The results of the simulation indicate that performance estimates from recognition experiments should be allowed wide error tolerances, and they illustrate the danger of trying too many features on the same database.

12 citations

Proceedings ArticleDOI
03 Apr 2018
TL;DR: Variational mode decomposition (VMD) is used for extracting relevant information of speech signal and outperforms the Mel cepstral coefficient (MFCC) in this paper.
Abstract: This paper presents the analysis and classification of Parkinson disease. When a people suffering from Parkinson disease their vocal fold and vocal tract is affected severely and thus speech characteristics are alter during phonation. In this paper variational mode decomposition (VMD) is used for extracting relevant information of speech signal. VMD decomposes the speech signal into modes or sub signal. Various statistical features (mean, variance, skewness and kurtosis), energy and energy entropy are used for Parkinson disease detection. From the experiment, VMD based feature outperforms the Mel cepstral coefficient (MFCC). The proposed feature shows the classification accuracy 96.29%.

12 citations

Proceedings ArticleDOI
25 Oct 2020
TL;DR: In this paper, a flow-based neural vocoder (WaveGlow) pre-trained on a large amount of English and Hungarian speech data was used for articulatory-to-acoustic mapping.
Abstract: For articulatory-to-acoustic mapping using deep neural networks, typically spectral and excitation parameters of vocoders have been used as the training targets. However, vocoding often results in buzzy and muffled final speech quality. Therefore, in this paper on ultrasound-based articulatory-to-acoustic conversion, we use a flow-based neural vocoder (WaveGlow) pre-trained on a large amount of English and Hungarian speech data. The inputs of the convolutional neural network are ultrasound tongue images. The training target is the 80-dimensional mel-spectrogram, which results in a finer detailed spectral representation than the previously used 25-dimensional Mel-Generalized Cepstrum. From the output of the ultrasound-to-mel-spectrogram prediction, WaveGlow inference results in synthesized speech. We compare the proposed WaveGlow-based system with a continuous vocoder which does not use strict voiced/unvoiced decision when predicting F0. The results demonstrate that during the articulatory-to-acoustic mapping experiments, the WaveGlow neural vocoder produces significantly more natural synthesized speech than the baseline system. Besides, the advantage of WaveGlow is that F0 is included in the mel-spectrogram representation, and it is not necessary to predict the excitation separately.

12 citations

01 Apr 2015
TL;DR: An efficient method for fusing the vision sensor and the AHRS with a criterion which is the amount of blur in the image is suggested and it is verified that the blur estimation method based on cepstrum analysis shows a better performance through the experiments.
Abstract: Underwater robots generally show better performances for tasks than humans under certain underwater constraints such as. high pressure, limited light, etc. To properly diagnose in an underwater environment using remotely operated underwater vehicles, it is important to keep autonomously its own position and orientation in order to avoid additional control efforts. In this paper, we propose an efficient method to assist in the operation for the various disturbances of a remotely operated vehicle for the diagnosis of underwater structures. The conventional AHRS-based bearing estimation system did not work well due to incorrect measurements caused by the hard-iron effect when the robot is approaching a ferromagnetic structure. To overcome this drawback, we propose a sensor fusion algorithm with the camera and AHRS for estimating the pose of the ROV. However, the image information in the underwater environment is often unreliable and blurred by turbidity or suspended solids. Thus, we suggest an efficient method for fusing the vision sensor and the AHRS with a criterion which is the amount of blur in the image. To evaluate the amount of blur, we adopt two methods: one is the quantification of high frequency components using the power spectrum density analysis of 2D discrete Fourier transformed image, and the other is identifying the blur parameter based on cepstrum analysis. We evaluate the performance of the robustness of the visual odometry and blur estimation methods according to the change of light and distance. We verify that the blur estimation method based on cepstrum analysis shows a better performance through the experiments.

12 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Robustness (computer science)
94.7K papers, 1.6M citations
80% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Deep learning
79.8K papers, 2.1M citations
79% related
Support vector machine
73.6K papers, 1.7M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202386
2022206
202160
202096
2019135
2018130