scispace - formally typeset
Search or ask a question
Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.


Papers
More filters
Journal ArticleDOI
Tao Zhang1, Yangyang Shao1, Yaqin Wu1, Yanzhang Geng1, Long Fan1 
TL;DR: An overview of the state-of-the-art in time domain, frequency domain and cepstrum domain for speech endpoint detection algorithms and to cast a glance at the challenges for future research is provided.

27 citations

Journal ArticleDOI
TL;DR: The study demonstrates that the SIFT algorithm has the potential to be a reliable and robust method for the estimation of MTBS in the presence of a small signal-to-noise ratio, a large spacing variation between regular scatterers, and a large scattering strength ratio of diffuse scatterer to regular ones.
Abstract: Ultrasonic backscatter signals provide useful information relevant to bone tissue characterization Trabecular bone microstructures have been considered as quasi-periodic tissues with a collection of regular and diffuse scatterers This paper investigates the potential of a novel technique using a simplified inverse filter tracking (SIFT) algorithm to estimate mean trabecular bone spacing (MTBS) from ultrasonic backscatter signals In contrast to other frequency-based methods, the SIFT algorithm is a time-based method and utilizes the amplitude and phase information of backscatter echoes, thus retaining the advantages of both the autocorrelation and the cepstral analysis techniques The SIFT algorithm was applied to backscatter signals from simulations, phantoms, and bovine trabeculae in vitro The estimated MTBS results were compared with those of the autoregressive (AR) cepstrum and quadratic transformation (QT) The SIFT estimates are better than the AR cepstrum estimates and are comparable with the QT values The study demonstrates that the SIFT algorithm has the potential to be a reliable and robust method for the estimation of MTBS in the presence of a small signal-to-noise ratio, a large spacing variation between regular scatterers, and a large scattering strength ratio of diffuse scatterers to regular ones

26 citations

Proceedings ArticleDOI
01 Nov 2019
TL;DR: A simple 2D-convolution multi-branch network architecture for replay detection, which can model the distortion both in the time and frequency domains and performance can be further improved by combining both magnitude-based and phase-based feature.
Abstract: Automatic Speaker Verification (ASV) technology is vulnerable to various kinds of spoofing attacks, including speech synthesis, voice conversion, and replay. Among them, the replay attack is easy to implement, posing a more severe threat to ASV. The constant-Q cepstrum coefficient (CQCC) feature is effective for detecting the replay attacks, but it only utilizes the magnitude of constant-Q transform (CQT) and discards the phase information. Meanwhile, the commonly used Gaussian mixture model (GMM) cannot model the reverberation present in far-field recordings. In this paper, we incorporate the CQT and modified group delay function (MGD) in order to utilize the phase of CQT. Also, we present a simple 2D-convolution multi-branch network architecture for replay detection, which can model the distortion both in the time and frequency domains. The experiment shows that the proposed CQT-based MGD feature outperforms traditional MGD feature, and performance can be further improved by combining both magnitude-based and phase-based feature. Our best fusion system achieves 0.0096 min-tDCF and 0.39% EER on ASVspoof 2019 Physical Access evaluation set. Comparing with the CQCC-GMM baseline system provided by the organizer, the min-tDCF is relatively reduced by 96.09% and EER is relatively reduced by 96.46%. Our system is submitted to the ASVspoof 2019 Physical Access sub-challenge and won 1st place.

26 citations

01 Jan 2000
TL;DR: Some revealing aspects of human auditory preception are considered and the mel- scaled cepstrum algorithm is examined in order to draw some con- clusions.
Abstract: The mel-scaled cepstrum is a signal representation scheme used in the analysis of speech signals. Due to its reported superior performance, especially under adverse conditions, it is becoming an increasingly popular choice as feature extraction front end to spoken language systems. Having evolved over a pe- riod of more than fifty years, the mel-scaled cepstrum owes part of its heritage to the pattern recognition community and part to perceptual and acoustical research. It represents a good trade-off between computational efficiency and perceptual considerations. Unfortunately, maybe because of its hybrid nature, the literature tends to be vague on the implementation details of mel-scaled cep- strum algorithms. In this paper we clarify some of the issues re- garding the algorithm and its implementation. Our investigation also serves to expose some fundamental flaws remaining in the established approach to speech signal feature extraction. I. Introduction HE pre-processing and feature extraction stages of a pattern recognition system serves as an interface between the real world and a classifier operating on an idealised model of reality. Information that is discarded in this stage is forever lost; conversely, noise that is accepted will degrade the performance of the classifier stage that is typically sensitive to complexity in the data. The signals that spoken language systems have to deal with is unique in the sense that it is generated by a bio- logical system, for a biological system. Human speech is the evolutionary product of the vocal and auditory sys- tems and not the other way around. The result shows a distinct lack of engineering common sense. As a matter of fact, psychophysical studies over the last number of decades tend to leave us with the uncomfortable feeling that the world perceived through our senses is rather different from the one that we measure with our instru- ments. We will now consider some revealing aspects of human auditory preception and then examine the mel- scaled cepstrum algorithm in order to draw some con- clusions.

26 citations

Journal ArticleDOI
01 Sep 2009
TL;DR: This paper proposes to use an evolution strategy to optimize the complementarity of two filter bank based feature extractors and shows that significant improvement can be obtained.
Abstract: Conventional automatic speaker verification systems are based on cepstral features like Mel-scale frequency cepstrum coefficient (MFCC), or linear predictive cepstrum coefficient (LPCC). Recent published works showed that the use of complementary features can significantly improve the system performances. In this paper, we propose to use an evolution strategy to optimize the complementarity of two filter bank based feature extractors. Experiments we made with a state of the art speaker verification system show that significant improvement can be obtained. Compared to the standard MFCC, an equal error rate (EER) improvement of 11.48% and 21.56% was obtained on the 2005 Nist SRE and Ntimit databases, respectively. Furthermore, the obtained filter banks picture out the importance of some specific spectral information for automatic speaker verification.

26 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Robustness (computer science)
94.7K papers, 1.6M citations
80% related
Feature (computer vision)
128.2K papers, 1.7M citations
79% related
Deep learning
79.8K papers, 2.1M citations
79% related
Support vector machine
73.6K papers, 1.7M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202386
2022206
202160
202096
2019135
2018130