scispace - formally typeset
Search or ask a question
Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.


Papers
More filters
Proceedings ArticleDOI
Mitchell McLaren1, Victor Abrash1, Martin Graciarena1, Yun Lei1, Jan Pesan 
25 Aug 2013
TL;DR: It was found that robustness to compressed speech was marginally improved by exposing PLDA to noisy and reverberant speech, with little improvement using trancoded speech in PLDA based on codecs mismatched to the evaluation conditions.
Abstract: The goal of this paper is to analyze the impact of codecdegraded speech on a state-of-the-art speaker recognition system and propose mitigation techniques. Several acoustic features are analyzed, including the standard Mel filterbank cepstral coefficients (MFCC), as well as the noise-robust medium duration modulation cepstrum (MDMC) and power normalized cepstral coefficients (PNCC), to determine whether robustness to noise generalizes to audio compression. Using a speaker recognition system based on i-vectors and probabilistic linear discriminant analysis (PLDA), we compared four PLDA training scenarios. The first involves training PLDA on clean data, the second included additional noisy and reverberant speech, a third introduces transcoded data matched to the evaluation conditions and the fourth, using codec-degraded speech mismatched to the evaluation conditions. We found that robustness to compressed speech was marginally improved by exposing PLDA to noisy and reverberant speech, with little improvement using trancoded speech in PLDA based on codecs mismatched to the evaluation conditions. Noise-robust features offered a degree of robustness to compressed speech while more significant improvements occurred when PLDA had observed the codec matching the evaluation conditions. Finally, we tested i-vector fusion from the different features, which increased overall system performance but did not improve robustness to codec-degraded speech. Index Terms: speaker recognition, speech coding, codec degradation, speaker verification.

19 citations

Proceedings ArticleDOI
22 May 2011
TL;DR: This paper proposes the use of acoustic voice source features extracted directly from the speech spectrum (or cepstrum) for cognitive load classification and proposes pre- and post-processing techniques to improve the estimation of the cepstral peak prominence (CPP).
Abstract: Previous work in speech-based cognitive load classification has shown that the glottal source contains important information for cognitive load discrimination. However, the reliability of glottal flow features depends on the accuracy of the glottal flow estimation, which is a non-trivial process. In this paper, we propose the use of acoustic voice source features extracted directly from the speech spectrum (or cepstrum) for cognitive load classification. We also propose pre-and post-processing techniques to improve the estimation of the cepstral peak prominence (CPP). 3-class classification results on two databases showed CPP as a promising cognitive load classification feature that outperforms glottal flow features. Score-level fusion of the CPP-based classification system with a formant frequency-based system yielded a final improved accuracy of 62.7%, suggesting that CPP contains useful voice source information that complements the information captured by vocal tract features.

19 citations

Journal ArticleDOI
TL;DR: The experimental results demonstrate that the proposed method surpasses most of the previous studies in point of classification accuracy and establishes applicability and efficacy of cepstrum-based features in classifying sEMG signals of hand movements.
Abstract: It is of great importance to effectively process and interpret surface electromyogram (sEMG) signals to actuate a robotic and prosthetic exoskeleton hand needed by hand amputees. In this paper, we have proposed a cepstrum analysis-based method for classification of basic hand movement sEMG signals. Cepstral analysis technique primarily used for analyzing acoustic and seismological signals is effectively exploited to extract features of time-domain sEMG signals by computing mel-frequency cepstral coefficients (MFCCs). The extracted feature vector consisting of MFCCs is then forwarded to feed a generalized regression neural network (GRNN) so as to classify basic hand movements. The proposed method has been tested on sEMG for Basic Hand movements Data Set and achieved an average accuracy rate of 99.34% for the five individual subjects and an overall mean accuracy rate of 99.23% for the collective (mixed) dataset. The experimental results demonstrate that the proposed method surpasses most of the previous studies in point of classification accuracy. Discrimination ability of the cepstral features exploited in this study is quantified using Kruskal-Wallis statistical test. Evidenced by the experimental results, this study explores and establishes applicability and efficacy of cepstrum-based features in classifying sEMG signals of hand movements. Owing to the non-iterative training nature of the artificial neural network type adopted in the study, the proposed method does not demand much time to build up the model in the training phase. Graphical abstract.

19 citations

Proceedings ArticleDOI
07 May 1996
TL;DR: It was shown that both LPCC and MFCC are effective representations, for smaller number of parameters, LPCC representation performs better but is surpassed by MFCC if the analysis order is larger.
Abstract: A large number of parameters, including pitch, LPCC, /spl Delta/LPCC, PARCOR, MFCC, /spl Delta/MFCC, and residual cepstrum (RCEP) were extracted from speech signals and their effectiveness for text-independent speaker identification was evaluated. In addition, the usefulness of two signal processing techniques, preemphasis and cepstral weighting, was also studied. The VQ-based speaker recognition method with codebooks fine-tuned by LVQ algorithm was used. It was shown that both LPCC and MFCC are effective representations, for smaller number of parameters, LPCC representation performs better but is surpassed by MFCC if the analysis order is larger. Pitch is an independent parameter so that it can be used jointly with other spectral features. In an evaluation experiment, the correct identification rate for 112 male speakers with test utterances of less than one second reached 98.2%.

19 citations

Patent
27 Mar 2018
TL;DR: In this paper, a voice enhancing method based on a multiresolution auditory cepstrum system and a deep convolutional neural network is proposed, which consists of three steps: establishing new characteristic parameters, namely MR-GFCC, capable of distinguishing voice from noise; secondly, establishing a self-adaptivemasking threshold on based on ideal soft masking (IRM) and ideal binary masking(IBM) according to noise variations; further training an established seven-layer neural network by using new extracted characteristic parameters and first/second derivatives thereof and the self
Abstract: The invention discloses a voice enhancing method based on a multiresolution auditory cepstrum system and a deep convolutional neural network. The voice enhancing method comprises the following steps:firstly, establishing new characteristic parameters, namely multiresolution auditory cepstrum coefficient (MR-GFCC), capable of distinguishing voice from noise; secondly, establishing a self-adaptivemasking threshold on based on ideal soft masking (IRM) and ideal binary masking (IBM) according to noise variations; further training an established seven-layer neural network by using new extracted characteristic parameters and first/second derivatives thereof and the self-adaptive masking threshold as input and output of the deep convolutional neural network (DCNN); and finally enhancing noise-containing voice by using the self-adaptive masking threshold estimated by the DCNN. By adopting the method, the working mechanism of human ears is sufficiently utilized, voice characteristic parameters simulating a human ear auditory physiological model are disposed, and not only is a relatively great deal of voice information maintained, but also the extraction process is simple and feasible.

19 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Robustness (computer science)
94.7K papers, 1.6M citations
80% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Deep learning
79.8K papers, 2.1M citations
79% related
Support vector machine
73.6K papers, 1.7M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202386
2022206
202160
202096
2019135
2018130