scispace - formally typeset
Search or ask a question
Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.


Papers
More filters
Posted Content
TL;DR: This paper uses a flow-based neural vocoder (WaveGlow) pre-trained on a large amount of English and Hungarian speech data to train a convolutional neural network for articulatory-to-acoustic mapping using deep neural networks.
Abstract: For articulatory-to-acoustic mapping using deep neural networks, typically spectral and excitation parameters of vocoders have been used as the training targets. However, vocoding often results in buzzy and muffled final speech quality. Therefore, in this paper on ultrasound-based articulatory-to-acoustic conversion, we use a flow-based neural vocoder (WaveGlow) pre-trained on a large amount of English and Hungarian speech data. The inputs of the convolutional neural network are ultrasound tongue images. The training target is the 80-dimensional mel-spectrogram, which results in a finer detailed spectral representation than the previously used 25-dimensional Mel-Generalized Cepstrum. From the output of the ultrasound-to-mel-spectrogram prediction, WaveGlow inference results in synthesized speech. We compare the proposed WaveGlow-based system with a continuous vocoder which does not use strict voiced/unvoiced decision when predicting F0. The results demonstrate that during the articulatory-to-acoustic mapping experiments, the WaveGlow neural vocoder produces significantly more natural synthesized speech than the baseline system. Besides, the advantage of WaveGlow is that F0 is included in the mel-spectrogram representation, and it is not necessary to predict the excitation separately.

11 citations

Journal Article
TL;DR: Two feature extraction methods based on differential power spectrum(DPS) and differential cepstrum,originally used in the research area of speech signal processing and homomorphic signal processing are respectively introduced to the radar target recognition community.
Abstract: The problem of target recognition using the high resolution radar range profiles is discussedTwo feature extraction methods based on differential power spectrum(DPS) and differential cepstrum,originally used in the research area of speech signal processing and homomorphic signal processing are respectively introduced to the radar target recognition communityTwo differential power spectrum based features are applied to target classificationA multi-layered feed forward neural network with SARPROP(simulated annealing resilient propagation) algorithm is selected as classifierThe range profiles are obtained with step-frequency technique and the two-dimension backscatter distribution data of four different scaled aircraft modelsSimulations are presented to evaluate the classification performance with the above featuresThe results show that the differential power spectrum based feature is effective and robust for the radar target recognition

11 citations

Proceedings ArticleDOI
08 Dec 2009
TL;DR: It is found that the proposed watermarking scheme, compared with the single channel echo-hiding method, is significantly more robust against different types of attacks.
Abstract: In this paper, a novel non-blind audio watermarking scheme is proposed. The method makes use of both channels of stereo music signals. The signals are divided into segments and watermarks are embedded by hiding echoes into these segments. The embedded echoes represent the watermark bits. For single channel watermarking, echoes with two different delays are used. One is used to encode watermark bit “1” and the other one is used to encode watermark bit “0”. In the proposed watermarking scheme, an echo is embedded only when the energy level of the intended segment exceeds a given threshold. Echoes with one value of delay are used to represent both binary bits. Echoes embedded in one channel are used to encode watermark bit “1” and echoes embedded in the other channel are used to encode watermark bit “0”. The embedded echoes or watermark are extracted by detecting time delay of each embedded echo in Cepstrum domain. Robustness and performance of this audio watermarking scheme under different types of attacks are investigated. It is found that the proposed watermarking scheme, compared with the single channel echo-hiding method, is significantly more robust against different types of attacks.

11 citations

Journal ArticleDOI
TL;DR: The feature of neutral delay-differential equations, namely, that the delay of the neutral part can be detected in the cepstrum of the output signal motivated the study that acceleration feedback can be identified using cepstral analysis.

11 citations

Proceedings ArticleDOI
04 Oct 2004
TL;DR: This study compared the classification of stops /p/, /t/, and /k/ based on spectral moments with classification based on an equal number of Bark Cepstrum coefficients, and found the best model used RMS amplitude plus all four bark-scaled spectral moment features at all four time intervals.
Abstract: Figure 3. Position of each case on linear discriminant 1 (LD1) versus linear discriminant 2 (LD2). Spectral moments analysis has been shown to be effective in deriving acoustic features for classifying voiceless stop release bursts [1], and is an analysis method that has commonly been cited in the clinical phonetics literature dealing with children’s disordered speech. In this study, we compared the classification of stops /p/, /t/, and /k/ based on spectral moments with classification based on an equal number of Bark Cepstrum coefficients. Utterance-initial /p/, /t/, and /k/ (1338 samples in all) were collected from a database of children’s speech. Linear discriminant analysis (LDA) was used to classify the three stops based on four analysis frames from the initial 40 msec of each token. The best model based on spectral moments used RMS amplitude plus all four bark-scaled spectral moment features at all four time intervals and yielded 78.0% correct discrimination. The best model of similar rank based on Bark cepstrum features yielded 86.6% correct segment discrimination.

11 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Robustness (computer science)
94.7K papers, 1.6M citations
80% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Deep learning
79.8K papers, 2.1M citations
79% related
Support vector machine
73.6K papers, 1.7M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202386
2022206
202160
202096
2019135
2018130