scispace - formally typeset
Search or ask a question
Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.


Papers
More filters
01 Sep 2005
TL;DR: In this article, a cepstrum-based iterative true envelope estimator is proposed for pitch shifting with preservation of the spectral envelope in the phase vocoder, which can reduce the run time by a factor of 2.5-11.
Abstract: In this article the estimation of the spectral envelope of sound signals is addressed. The intended application for the developed algorithm is pitch shifting with preservation of the spectral envelope in the phase vocoder. As a first step the different existing envelope estimation algorithms are investigated and their specific properties discussed. As the most promising algorithm the cepstrum based iterative true envelope estimator is selected. By means of controlled sub-sampling of the log amplitude spectrum and by means of a simple step size control for the iterative algorithm the run time of the algorithm can be decreased by a factor of 2.5-11. As a remedy for the ringing effects in the the spectral envelope that are due to the rectangular filter used for spectral smoothing we propose the use of a Hamming window as smoothing filter. The resulting implementation of the algorithm has slightly increased computational complexity compared to the standard LPC algorithm but offers significantly improved control over the envelope characteristics. The application of the true envelope estimator in a pitch shifting application is investigated. The main problems for pitch shifting with envelope preservation in a phase vocoder are identified and a simple yet efficient remedy is proposed.

146 citations

Journal ArticleDOI
01 Aug 2000
TL;DR: This work inserts a digital watermark into the cepstral components of the audio signal using a technique analogous to spread spectrum communications, hiding a narrow band signal in a wideband channel.
Abstract: We propose a digital audio watermarking technique in the cepstrum domain. We insert a digital watermark into the cepstral components of the audio signal using a technique analogous to spread spectrum communications, hiding a narrow band signal in a wideband channel. In our method, we use pseudo-random sequences to watermark the audio signal. The watermark is then weighted in the cepstrum domain according to the distribution of cepstral coefficients and the frequency masking characteristics of the human auditory system. Watermark embedding minimizes the audibility of the watermark signal. The embedded watermark is robust to multiple watermarks, MPEG audio coding and additive noise.

143 citations

Dissertation
01 Jan 1993
TL;DR: In this article, the authors describe a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment, including the use of desk-top microphones and different training and testing conditions.
Abstract: This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical environments, and when a desk-top microphone (rather than a close-talking microphone) is used for speech input Without such processing, mismatches between training and testing conditions produce an unacceptable degradation in recognition accuracy Two kinds of environmental variability are introduced by the use of desk-top microphones and different training and testing conditions: additive noise and spectral tilt introduced by linear filtering An important attribute of the novel compensation algorithms described in this thesis is that they provide joint rather than independent compensation for these two types of degradation Acoustical compensation is applied in our algorithms as an additive correction in the cepstral domain This allows a higher degree of integration within SPHINX, the Carnegie Mellon speech recognition system, that uses the cepstrum as its feature vector Therefore, these algorithms can be implemented very efficiently Processing in many of these algorithms is based on instantaneous signal-to-noise ratio (SNR), as the appropriate compensation represents a form of noise suppression at low SNRs and spectral equalization at high SNRs The compensation vectors for additive noise and spectral transformations are estimated by minimizing the differences between speech feature vectors obtained from a “standard” training corpus of speech and feature vectors that represent the current acoustical environment In our work this is accomplished by a minimizing the distortion of vector-quantized cepstra that are produced by the feature extraction module in SPHINX In this dissertation we describe several algorithms including the SNR-Dependent Cepstral Normalization, (SDCN) and the Codeword-Dependent Cepstral Normalization (CDCN) With CDCN, the accuracy of SPHINX when trained on speech recorded with a close-talking microphone and tested on speech recorded with a desk-top microphone is essentially the same obtained when the system is trained and tested on speech from the desk-top microphone An algorithm for frequency normalization has also been proposed in which the parameter of the bilinear transformation that is used by the signal-processing stage to produce frequency warping is adjusted for each new speaker and acoustical environment The optimum value of this parameter is again chosen to minimize the vector-quantization distortion between the standard environment and the current one In preliminary studies, use of this frequency normalization produced a moderate additional decrease in the observed error rate

142 citations

Journal ArticleDOI
TL;DR: Digital data-processing problems such as the detection of multiple echoes, various methods of linear filtering the complex cepstrum, the picket-fence phenomenon, minimum-maximum phase situations, and amplitude- versus phase-smoothing for the additive-noise case are examined empirically and where possible theoretically, and are discussed.
Abstract: A technique for decomposing a composite signal of unknown multiple wavelets overlapping in time is described. The computation algorithm incorporates the power cepstrum and complex cepstrum techniques. It has been found that the power cepstrum is most efficient in recognizing wavelet arrival times and amplitudes while the complex cepstrum is invaluable in estimating the form of the basic wavelet and its echoes, even if the latter are distorted. Digital data-processing problems such as the detection of multiple echoes, various methods of linear filtering the complex cepstrum, the picket-fence phenomenon, minimum-maximum phase situations, and amplitude- versus phase-smoothing for the additive-noise case are examined empirically and where possible theoretically, and are discussed. A similar investigation is performed for some of the preceding problems when the echo or echoes are distorted versions of the wavelet, thereby giving some insight into the complex problem of separating a composite signal composed of several additive stochastic processes. The threshold results are still empirical and the results should be extended to multi-dimensional data. Applications are the decomposition or resolution of signals (e.g., echoes) in radar and sonar, seismology, speech, brain waves, and neuroelectric spike data. Examples of results are presented for decomposition in the absence and presence of noise for specified signals. Results are tendered for the decomposition of pulse-type data appropriate to many systems and for the decomposition of brain waves evoked by visual stimulation.

140 citations

Proceedings ArticleDOI
25 Aug 2013
TL;DR: This paper presents a voice conversion technique using Deep Belief Nets (DBNs) to build high-order eigen spaces of the source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space.
Abstract: This paper presents a voice conversion technique using Deep Belief Nets (DBNs) to build high-order eigen spaces of the source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. DBNs have a deep architecture that automatically discovers abstractions to maximally express the original input features. If we train the DBNs using only the speech of an individual speaker, it can be considered that there is less phonological information and relatively more speaker individuality in the output features at the highest layer. Training the DBNs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NNs). The converted abstraction of the source speaker is then brought back to the cepstrum space using an inverse process of the DBNs of the target speaker. We conducted speakervoice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method.

140 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Robustness (computer science)
94.7K papers, 1.6M citations
80% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Deep learning
79.8K papers, 2.1M citations
79% related
Support vector machine
73.6K papers, 1.7M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202386
2022206
202160
202096
2019135
2018130