Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Efficient spectral envelope estimation and its application to pitch shifting and envelope preservation

[...]

Axel Roebel, Xavier Rodet

01 Sep 2005

TL;DR: In this article, a cepstrum-based iterative true envelope estimator is proposed for pitch shifting with preservation of the spectral envelope in the phase vocoder, which can reduce the run time by a factor of 2.5-11.

...read moreread less

Abstract: In this article the estimation of the spectral envelope of sound signals is addressed. The intended application for the developed algorithm is pitch shifting with preservation of the spectral envelope in the phase vocoder. As a first step the different existing envelope estimation algorithms are investigated and their specific properties discussed. As the most promising algorithm the cepstrum based iterative true envelope estimator is selected. By means of controlled sub-sampling of the log amplitude spectrum and by means of a simple step size control for the iterative algorithm the run time of the algorithm can be decreased by a factor of 2.5-11. As a remedy for the ringing effects in the the spectral envelope that are due to the rectangular filter used for spectral smoothing we propose the use of a Hamming window as smoothing filter. The resulting implementation of the algorithm has slightly increased computational complexity compared to the standard LPC algorithm but offers significantly improved control over the envelope characteristics. The application of the true envelope estimator in a pitch shifting application is investigated. The main problems for pitch shifting with envelope preservation in a phase vocoder are identified and a simple yet efficient remedy is proposed.

...read moreread less

146 citations

Journal Article•DOI•

Digital audio watermarking in the cepstrum domain

[...]

Sang-Kwang Lee¹, Yo-Sung Ho•Institutions (1)

Gwangju Institute of Science and Technology¹

01 Aug 2000

TL;DR: This work inserts a digital watermark into the cepstral components of the audio signal using a technique analogous to spread spectrum communications, hiding a narrow band signal in a wideband channel.

...read moreread less

Abstract: We propose a digital audio watermarking technique in the cepstrum domain. We insert a digital watermark into the cepstral components of the audio signal using a technique analogous to spread spectrum communications, hiding a narrow band signal in a wideband channel. In our method, we use pseudo-random sequences to watermark the audio signal. The watermark is then weighted in the cepstrum domain according to the distribution of cepstral coefficients and the frequency masking characteristics of the human auditory system. Watermark embedding minimizes the audibility of the watermark signal. The embedded watermark is robust to multiple watermarks, MPEG audio coding and additive noise.

...read moreread less

143 citations

Dissertation•

Acoustical and Environmental Robustness in Automatic Speech Recognition

[...]

Alex Acero

01 Jan 1993

TL;DR: In this article, the authors describe a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment, including the use of desk-top microphones and different training and testing conditions.

...read moreread less

Abstract: This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical environments, and when a desk-top microphone (rather than a close-talking microphone) is used for speech input Without such processing, mismatches between training and testing conditions produce an unacceptable degradation in recognition accuracy Two kinds of environmental variability are introduced by the use of desk-top microphones and different training and testing conditions: additive noise and spectral tilt introduced by linear filtering An important attribute of the novel compensation algorithms described in this thesis is that they provide joint rather than independent compensation for these two types of degradation Acoustical compensation is applied in our algorithms as an additive correction in the cepstral domain This allows a higher degree of integration within SPHINX, the Carnegie Mellon speech recognition system, that uses the cepstrum as its feature vector Therefore, these algorithms can be implemented very efficiently Processing in many of these algorithms is based on instantaneous signal-to-noise ratio (SNR), as the appropriate compensation represents a form of noise suppression at low SNRs and spectral equalization at high SNRs The compensation vectors for additive noise and spectral transformations are estimated by minimizing the differences between speech feature vectors obtained from a “standard” training corpus of speech and feature vectors that represent the current acoustical environment In our work this is accomplished by a minimizing the distortion of vector-quantized cepstra that are produced by the feature extraction module in SPHINX In this dissertation we describe several algorithms including the SNR-Dependent Cepstral Normalization, (SDCN) and the Codeword-Dependent Cepstral Normalization (CDCN) With CDCN, the accuracy of SPHINX when trained on speech recorded with a close-talking microphone and tested on speech recorded with a desk-top microphone is essentially the same obtained when the system is trained and tested on speech from the desk-top microphone An algorithm for frequency normalization has also been proposed in which the parameter of the bilinear transformation that is used by the signal-processing stage to produce frequency warping is adjusted for each new speaker and acoustical environment The optimum value of this parameter is again chosen to minimize the vector-quantization distortion between the standard environment and the current one In preliminary studies, use of this frequency normalization produced a moderate additional decrease in the observed error rate

...read moreread less

142 citations

Journal Article•DOI•

Signal detection and extraction by cepstrum techniques

[...]

R. Kemerait¹, D. Childers²•Institutions (2)

Florida Institute of Technology¹, University of Florida²

01 Nov 1972-IEEE Transactions on Information Theory

TL;DR: Digital data-processing problems such as the detection of multiple echoes, various methods of linear filtering the complex cepstrum, the picket-fence phenomenon, minimum-maximum phase situations, and amplitude- versus phase-smoothing for the additive-noise case are examined empirically and where possible theoretically, and are discussed.

...read moreread less

Abstract: A technique for decomposing a composite signal of unknown multiple wavelets overlapping in time is described. The computation algorithm incorporates the power cepstrum and complex cepstrum techniques. It has been found that the power cepstrum is most efficient in recognizing wavelet arrival times and amplitudes while the complex cepstrum is invaluable in estimating the form of the basic wavelet and its echoes, even if the latter are distorted. Digital data-processing problems such as the detection of multiple echoes, various methods of linear filtering the complex cepstrum, the picket-fence phenomenon, minimum-maximum phase situations, and amplitude- versus phase-smoothing for the additive-noise case are examined empirically and where possible theoretically, and are discussed. A similar investigation is performed for some of the preceding problems when the echo or echoes are distorted versions of the wavelet, thereby giving some insight into the complex problem of separating a composite signal composed of several additive stochastic processes. The threshold results are still empirical and the results should be extended to multi-dimensional data. Applications are the decomposition or resolution of signals (e.g., echoes) in radar and sonar, seismology, speech, brain waves, and neuroelectric spike data. Examples of results are presented for decomposition in the absence and presence of noise for specified signals. Results are tendered for the decomposition of pulse-type data appropriate to many systems and for the decomposition of brain waves evoked by visual stimulation.

...read moreread less

140 citations

Proceedings Article•DOI•

Voice Conversion in High-order Eigen Space Using Deep Belief Nets

[...]

Toru Nakashika¹, Ryoichi Takashima¹, Tetsuya Takiguchi¹, Yasuo Ariki¹•Institutions (1)

Kobe University¹

25 Aug 2013

TL;DR: This paper presents a voice conversion technique using Deep Belief Nets (DBNs) to build high-order eigen spaces of the source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space.

...read moreread less

Abstract: This paper presents a voice conversion technique using Deep Belief Nets (DBNs) to build high-order eigen spaces of the source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. DBNs have a deep architecture that automatically discovers abstractions to maximally express the original input features. If we train the DBNs using only the speech of an individual speaker, it can be considered that there is less phonological information and relatively more speaker individuality in the output features at the highest layer. Training the DBNs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NNs). The converted abstraction of the source speaker is then brought back to the cepstrum space using an inverse process of the DBNs of the target speaker. We conducted speakervoice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method.

...read moreread less

140 citations

Collapse

Network Information

Performance

Metrics

3,645

Papers

60,375

Citations

No. of papers in the topic in previous years
Year	Papers
2023	86
2022	206
2021	60
2020	96
2019	135
2018	130

Cepstrum

Papers published on a yearly basis

Papers

Trending Questions (9)

Network Information

Related Topics (5)

Performance

Metrics