Topic
Cepstrum
About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: Experiments showed that speech quality is significantly improved by the proposed mel-cepstrum-based quantization noise shaping method, which effectively masks the white noise introduced by the quantization typically used in neural-network-based speech waveform synthesis systems.
Abstract: This paper presents a mel-cepstrum-based quantization noise shaping method for improving the quality of synthetic speech generated by neural-network-based speech waveform synthesis systems. Since mel-cepstral coefficients closely match the characteristics of human auditory perception, the proposed method effectively masks the white noise introduced by the quantization typically used in neural-network-based speech waveform synthesis systems. The paper also describes a computationally efficient implementation of the proposed method using the structure of the mel-log spectrum approximation filter. Experiments using the WaveNet generative model, which is a state-of-the-art model for neural-network-based speech waveform synthesis, showed that speech quality is significantly improved by the proposed method.
21 citations
••
TL;DR: This new method, which avoids root finding, reduces the computer time significantly and imposes negligible overhead when compared with the approach of finding the LP cepstrum.
Abstract: In speaker recognition systems, the adaptive component weighted (ACW) cepstrum has been shown to be more robust than the conventional linear predictive (LP) cepstrum. The ACW cepstrum is derived from a pole-zero transfer function whose denominator is the pth-order LP polynomial A(z). The numerator is a (p-1)th-order polynomial that is up to now found as follows. The roots of A(z) are computed, and the corresponding residues obtained by a partial fraction expansion of 1/A(z) are set to unity. Therefore, the numerator is the sum of all the (p-1)th-order cofactors of A(z). We show that the numerator polynomial is merely the derivative of the denominator polynomial A(z). This greatly speeds up the computation of the numerator polynomial coefficients since it involves a simple scaling of the denominator polynomial coefficients. Root finding is completely eliminated. Since the denominator is guaranteed to be minimum phase and the numerator can be proven to be minimum phase, two separate recursions involving the polynomial coefficients establishes the ACW cepstrum. This new method, which avoids root finding, reduces the computer time significantly and imposes negligible overhead when compared with the approach of finding the LP cepstrum.
21 citations
••
01 Dec 2016TL;DR: Comparative analysis with literature shows that the proposed algorithm outperforms in terms of recognition accuracy and promising results are obtained with the mentioned combination of features in discrete emotional category as well as in arousal dimension of the continuous emotion circumflex.
Abstract: Speech is one of the most popular modalities for emotion recognition. This work uses Mel and Bark scale dependent perceptual auditory features for recognizing seven emotions from Berlin speech corpus. A combination of Mel Frequency Cepstral Coefficients (MFCC's), Perceptual Linear Predictive Cepstrum (PLPC), Mel Frequency Perceptual Linear Predictive Cepstrum (MFPLPC) and Linear predictive coefficients (LPC) are chosen for the task. The role of perceptual based features is analyzed for effective Speech Emotion Recognition (SER). A search for a compact feature vector of perceptual features is carried out in the discrete and continuous emotion category. Neural network classifier is employed for classification. Comparative analysis with literature shows that the proposed algorithm outperforms in terms of recognition accuracy. Promising results are obtained with the mentioned combination of features in discrete emotional category as well as in arousal dimension of the continuous emotion circumflex.
21 citations
••
TL;DR: In this article, a diagnosis apparatus for discriminating a property of the tissue to be observed from the reflected ultrasonic wave using the nature of the fine structure of tissue, and more practically uses the interval of small reflecting bodies dispersely distributed in the tissue as the parameters.
Abstract: A diagnosis apparatus for discriminating a property of the tissue to be observed from the reflected ultrasonic wave uses the nature of the fine structure of the tissue, and more practically uses the interval of small reflecting bodies dispersely distributed in the tissue as the parameters. The intervals flucatuate. Therefore, an average value and/or a degree of fluctuation is calculated and is displayed. For obtaining the average interval, a method of using the cepstrum of the received signal or a method of using the self-correlation of the received signal can be used.
21 citations
••
21 Apr 1997
TL;DR: Application to vowel and noisy telephone speech recognition tasks shows that the DFE method realization of optimal filter bank-based cepstral parameters realizes a more robust classifier by appropriate feature extraction.
Abstract: This paper investigates the realization of optimal filter bank-based cepstral parameters. The framework is the discriminative feature extraction method (DFE) which iteratively estimates the filter-bank parameters according to the errors that the system makes. Various parameters of the filter-bank, such as center frequency, bandwidth, and gain are optimized using a string-level optimization and a frame-level optimization scheme. Application to vowel and noisy telephone speech recognition tasks shows that the DFE method realizes a more robust classifier by appropriate feature extraction.
21 citations