scispace - formally typeset
Search or ask a question
Topic

Mel-frequency cepstrum

About: Mel-frequency cepstrum is a research topic. Over the lifetime, 6455 publications have been published within this topic receiving 92772 citations. The topic is also known as: Mel Frequency Cepstral Coefficients.


Papers
More filters
Journal ArticleDOI
TL;DR: This article narrates the historical and mathematical background that led to the invention of the term cepstrum and describes how the term has survived and has become part of the digital signal processing lexicon.
Abstract: The idea of the log spectrum or cepstral averaging has been useful in many applications such as audio processing, speech processing, speech recognition, and echo detection for the estimation and compensation of convolutional distortions. To suggest what prompted the invention of the term cepstrum, this article narrates the historical and mathematical background that led to its discovery. The computations of earlier simple echo representations have shown that the spectrum representation domain results does not belong in the frequency or time domain. Bogert et al. (1963) chose to refer to it as quefrency domain and later termed the spectrum of the log of a time waveform as the cepstrum. The article also recounts the analysis of Al Oppenheim in relation to the cepstrum. It was in his theory for nonlinear signal processing, referred to as homomorphic systems, that the realization of the characteristic system of homomorphic convolution was reminiscent of the cepstrum. To retain both the relationship to the work of Bogart et al. and the distinction, the term power cepstrum was eventually applied to the nonlinear mapping in homomorphic deconvolution . While most of the terms in the glossary have faded into the background, the term cepstrum has survived and has become part of the digital signal processing lexicon.

376 citations

Journal ArticleDOI
TL;DR: Modulation spectral features are proposed for the automatic recognition of human affective information from speech and render a substantial improvement in recognition performance when used to augment prosodic features, which have been extensively used for emotion recognition.

359 citations

Journal ArticleDOI
TL;DR: This correspondence presents an experimental evaluation of different features and channel compensation techniques for robust speaker identification, and it is shown that performance differences between the basic features is small, and the major gains are due to the channel Compensation techniques.
Abstract: This correspondence presents an experimental evaluation of different features and channel compensation techniques for robust speaker identification. The goal is to keep all processing and classification steps constant and to vary only the features and compensations used to allow a controlled comparison. A general, maximum-likelihood classifier based on Gaussian mixture densities is used as the classifier, and experiments are conducted on the King speech database, a conversational, telephone-speech database. The features examined are mel-frequency and linear-frequency filterbank cepstral coefficients, linear prediction cepstral coefficients, and perceptual linear prediction (PLP) cepstral coefficients. The channel compensation techniques examined are cepstral mean removal, RASTA processing, and a quadratic trend removal technique. It is shown for this database that performance differences between the basic features is small, and the major gains are due to the channel compensation techniques. The best "across-the-divide" recognition accuracy of 92% is obtained for both high-order LPC features and band-limited filterbank features. >

336 citations

01 Jan 2007
TL;DR: A comparative evaluation of the presented MFCC implementations is performed on the task of text-independent speaker verification, by means of the well-known 2001 NIST SRE (speaker recognition evaluation) one-speaker detection database.
Abstract: Making no claim of being exhaustive, a review of the most popular MFCC (Mel Frequency Cepstral Coefficients) implementations is made. These differ mainly in the particular approximation of the nonlinear pitch perception of human, the filter bank design, and the compression of the filter bank output. Then, a comparative evaluation of the presented implementations is performed on the task of text-independent speaker verification, by means of the well-known 2001 NIST SRE (speaker recognition evaluation) one-speaker detection database.

333 citations

Journal ArticleDOI
TL;DR: Experimental results show that the proposed Fourier parameter (FP) features are effective in identifying various emotional states in speech signals and improve the recognition rates over the methods using Mel frequency cepstral coefficient features.
Abstract: Recently, studies have been performed on harmony features for speech emotion recognition. It is found in our study that the first- and second-order differences of harmony features also play an important role in speech emotion recognition. Therefore, we propose a new Fourier parameter model using the perceptual content of voice quality and the first- and second-order differences for speaker-independent speech emotion recognition. Experimental results show that the proposed Fourier parameter (FP) features are effective in identifying various emotional states in speech signals. They improve the recognition rates over the methods using Mel frequency cepstral coefficient (MFCC) features by 16.2, 6.8 and 16.6 points on the German database (EMODB), Chinese language database (CASIA) and Chinese elderly emotion database (EESDB). In particular, when combining FP with MFCC, the recognition rates can be further improved on the aforementioned databases by 17.5, 10 and 10.5 points, respectively.

328 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
88% related
Deep learning
79.8K papers, 2.1M citations
85% related
Wireless sensor network
142K papers, 2.4M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023304
2022772
2021363
2020423
2019419
2018431