Topic
Cepstrum
About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: A system for automatically estimating the lowest three formants and the pitch period of voiced speech is presented, based on a digital computation of the cepstrum (defined as the inverse transform of the log magnitude of the z‐transform).
Abstract: A system for automatically estimating the lowest three formants and the pitch period of voiced speech is presented. The system is based on a digital computation of the cepstrum (defined as the inverse transform of the log magnitude of the z‐transform). The pitch period estimate and smoothed log magnitude are obtained from the cepstrum. Formants are estimated from the smoothed spectral envelope using constraints on formant frequency ranges and relative levels of spectral peaks at the formant frequencies. These constraints allow the detection of cases where two formants are too close together in frequency to be resolved in the initial spectral envelope. In these cases, a new spectral analysis algorithm (the chirp z‐transform algorithm) allows the efficient computation of a narrow‐band spectrum in which the formant resolution is enhanced. Formant and pitch period data obtained by the analysis system are used to control a digital formant synthesizer. Results, in the form of spectrograms, are presented to illu...
289 citations
••
TL;DR: The Learning Vector quantization methodology demonstrated to be more reliable than the multilayer perceptron architecture yielding 96% frame accuracy under similar working conditions.
Abstract: It is well known that vocal and voice diseases do not necessarily cause perceptible changes in the acoustic voice signal. Acoustic analysis is a useful tool to diagnose voice diseases being a complementary technique to other methods based on direct observation of the vocal folds by laryngoscopy. Through the present paper two neural-network based classification approaches applied to the automatic detection of voice disorders will be studied. Structures studied are multilayer perceptron and learning vector quantization fed using short-term vectors calculated accordingly to the well-known Mel Frequency Coefficient cepstral parameterization. The paper shows that these architectures allow the detection of voice disorders-including glottic cancer-under highly reliable conditions. Within this context, the Learning Vector quantization methodology demonstrated to be more reliable than the multilayer perceptron architecture yielding 96% frame accuracy under similar working conditions.
250 citations
••
TL;DR: In this paper, two approaches are proposed to enhance the entry event while keeping the impulse response in order to enable a clear separation of the two events, and produce an averaged estimate of the size of the fault.
237 citations
••
14 Apr 1991TL;DR: Several algorithms are presented that increase the robustness of SPHINX, the CMU (Carnegie Mellon University) continuous-speech speaker-independent recognition systems, by normalizing the acoustic space via minimization of the overall VQ distortion.
Abstract: Several algorithms are presented that increase the robustness of SPHINX, the CMU (Carnegie Mellon University) continuous-speech speaker-independent recognition systems, by normalizing the acoustic space via minimization of the overall VQ distortion. The authors propose an affine transformation of the cepstrum in which a matrix multiplication perform frequency normalization and a vector addition attempts environment normalization. The algorithms for environment normalization are efficient and improve the recognition accuracy when the system is tested on a microphone other than the one on which it was trained. The frequency normalization algorithm applies a different warping on the frequency axis to different speakers and it achieves a 10% decrease in error rate. >
229 citations
••
22 May 2011TL;DR: A novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable is presented and initial results demonstrate phoneme recognition performance better than the current state-of-the-art for methods based on Mel cepstrum coefficients.
Abstract: State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, such low dimensional encodings may lose some relevant information and express other information in a way that makes it difficult to use for discrimination. Higher dimensional encodings could both improve performance in recognition tasks, and also be applied to speech synthesis by better modeling the statistical structure of the sound waves. In this paper we present a novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable and we report initial results demonstrating phoneme recognition performance better than the current state-of-the-art for methods based on Mel cepstrum coefficients.
223 citations