scispace - formally typeset
Search or ask a question
Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A system for automatically estimating the lowest three formants and the pitch period of voiced speech is presented, based on a digital computation of the cepstrum (defined as the inverse transform of the log magnitude of the z‐transform).
Abstract: A system for automatically estimating the lowest three formants and the pitch period of voiced speech is presented. The system is based on a digital computation of the cepstrum (defined as the inverse transform of the log magnitude of the z‐transform). The pitch period estimate and smoothed log magnitude are obtained from the cepstrum. Formants are estimated from the smoothed spectral envelope using constraints on formant frequency ranges and relative levels of spectral peaks at the formant frequencies. These constraints allow the detection of cases where two formants are too close together in frequency to be resolved in the initial spectral envelope. In these cases, a new spectral analysis algorithm (the chirp z‐transform algorithm) allows the efficient computation of a narrow‐band spectrum in which the formant resolution is enhanced. Formant and pitch period data obtained by the analysis system are used to control a digital formant synthesizer. Results, in the form of spectrograms, are presented to illu...

289 citations

Journal ArticleDOI
TL;DR: The Learning Vector quantization methodology demonstrated to be more reliable than the multilayer perceptron architecture yielding 96% frame accuracy under similar working conditions.
Abstract: It is well known that vocal and voice diseases do not necessarily cause perceptible changes in the acoustic voice signal. Acoustic analysis is a useful tool to diagnose voice diseases being a complementary technique to other methods based on direct observation of the vocal folds by laryngoscopy. Through the present paper two neural-network based classification approaches applied to the automatic detection of voice disorders will be studied. Structures studied are multilayer perceptron and learning vector quantization fed using short-term vectors calculated accordingly to the well-known Mel Frequency Coefficient cepstral parameterization. The paper shows that these architectures allow the detection of voice disorders-including glottic cancer-under highly reliable conditions. Within this context, the Learning Vector quantization methodology demonstrated to be more reliable than the multilayer perceptron architecture yielding 96% frame accuracy under similar working conditions.

250 citations

Journal ArticleDOI
TL;DR: In this paper, two approaches are proposed to enhance the entry event while keeping the impulse response in order to enable a clear separation of the two events, and produce an averaged estimate of the size of the fault.

237 citations

Proceedings ArticleDOI
14 Apr 1991
TL;DR: Several algorithms are presented that increase the robustness of SPHINX, the CMU (Carnegie Mellon University) continuous-speech speaker-independent recognition systems, by normalizing the acoustic space via minimization of the overall VQ distortion.
Abstract: Several algorithms are presented that increase the robustness of SPHINX, the CMU (Carnegie Mellon University) continuous-speech speaker-independent recognition systems, by normalizing the acoustic space via minimization of the overall VQ distortion. The authors propose an affine transformation of the cepstrum in which a matrix multiplication perform frequency normalization and a vector addition attempts environment normalization. The algorithms for environment normalization are efficient and improve the recognition accuracy when the system is tested on a microphone other than the one on which it was trained. The frequency normalization algorithm applies a different warping on the frequency axis to different speakers and it achieves a 10% decrease in error rate. >

229 citations

Proceedings ArticleDOI
22 May 2011
TL;DR: A novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable is presented and initial results demonstrate phoneme recognition performance better than the current state-of-the-art for methods based on Mel cepstrum coefficients.
Abstract: State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, such low dimensional encodings may lose some relevant information and express other information in a way that makes it difficult to use for discrimination. Higher dimensional encodings could both improve performance in recognition tasks, and also be applied to speech synthesis by better modeling the statistical structure of the sound waves. In this paper we present a novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable and we report initial results demonstrating phoneme recognition performance better than the current state-of-the-art for methods based on Mel cepstrum coefficients.

223 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Robustness (computer science)
94.7K papers, 1.6M citations
80% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Deep learning
79.8K papers, 2.1M citations
79% related
Support vector machine
73.6K papers, 1.7M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202386
2022206
202160
202096
2019135
2018130