Topic
Cepstrum
About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: An overview of the state-of-the-art in time domain, frequency domain and cepstrum domain for speech endpoint detection algorithms and to cast a glance at the challenges for future research is provided.
27 citations
••
TL;DR: The study demonstrates that the SIFT algorithm has the potential to be a reliable and robust method for the estimation of MTBS in the presence of a small signal-to-noise ratio, a large spacing variation between regular scatterers, and a large scattering strength ratio of diffuse scatterer to regular ones.
Abstract: Ultrasonic backscatter signals provide useful information relevant to bone tissue characterization Trabecular bone microstructures have been considered as quasi-periodic tissues with a collection of regular and diffuse scatterers This paper investigates the potential of a novel technique using a simplified inverse filter tracking (SIFT) algorithm to estimate mean trabecular bone spacing (MTBS) from ultrasonic backscatter signals In contrast to other frequency-based methods, the SIFT algorithm is a time-based method and utilizes the amplitude and phase information of backscatter echoes, thus retaining the advantages of both the autocorrelation and the cepstral analysis techniques The SIFT algorithm was applied to backscatter signals from simulations, phantoms, and bovine trabeculae in vitro The estimated MTBS results were compared with those of the autoregressive (AR) cepstrum and quadratic transformation (QT) The SIFT estimates are better than the AR cepstrum estimates and are comparable with the QT values The study demonstrates that the SIFT algorithm has the potential to be a reliable and robust method for the estimation of MTBS in the presence of a small signal-to-noise ratio, a large spacing variation between regular scatterers, and a large scattering strength ratio of diffuse scatterers to regular ones
26 citations
••
01 Nov 2019
TL;DR: A simple 2D-convolution multi-branch network architecture for replay detection, which can model the distortion both in the time and frequency domains and performance can be further improved by combining both magnitude-based and phase-based feature.
Abstract: Automatic Speaker Verification (ASV) technology is vulnerable to various kinds of spoofing attacks, including speech synthesis, voice conversion, and replay. Among them, the replay attack is easy to implement, posing a more severe threat to ASV. The constant-Q cepstrum coefficient (CQCC) feature is effective for detecting the replay attacks, but it only utilizes the magnitude of constant-Q transform (CQT) and discards the phase information. Meanwhile, the commonly used Gaussian mixture model (GMM) cannot model the reverberation present in far-field recordings. In this paper, we incorporate the CQT and modified group delay function (MGD) in order to utilize the phase of CQT. Also, we present a simple 2D-convolution multi-branch network architecture for replay detection, which can model the distortion both in the time and frequency domains. The experiment shows that the proposed CQT-based MGD feature outperforms traditional MGD feature, and performance can be further improved by combining both magnitude-based and phase-based feature. Our best fusion system achieves 0.0096 min-tDCF and 0.39% EER on ASVspoof 2019 Physical Access evaluation set. Comparing with the CQCC-GMM baseline system provided by the organizer, the min-tDCF is relatively reduced by 96.09% and EER is relatively reduced by 96.46%. Our system is submitted to the ASVspoof 2019 Physical Access sub-challenge and won 1st place.
26 citations
01 Jan 2000
TL;DR: Some revealing aspects of human auditory preception are considered and the mel- scaled cepstrum algorithm is examined in order to draw some con- clusions.
Abstract: The mel-scaled cepstrum is a signal representation scheme used in the analysis of speech signals. Due to its reported superior performance, especially under adverse conditions, it is becoming an increasingly popular choice as feature extraction front end to spoken language systems. Having evolved over a pe- riod of more than fifty years, the mel-scaled cepstrum owes part of its heritage to the pattern recognition community and part to perceptual and acoustical research. It represents a good trade-off between computational efficiency and perceptual considerations. Unfortunately, maybe because of its hybrid nature, the literature tends to be vague on the implementation details of mel-scaled cep- strum algorithms. In this paper we clarify some of the issues re- garding the algorithm and its implementation. Our investigation also serves to expose some fundamental flaws remaining in the established approach to speech signal feature extraction. I. Introduction HE pre-processing and feature extraction stages of a pattern recognition system serves as an interface between the real world and a classifier operating on an idealised model of reality. Information that is discarded in this stage is forever lost; conversely, noise that is accepted will degrade the performance of the classifier stage that is typically sensitive to complexity in the data. The signals that spoken language systems have to deal with is unique in the sense that it is generated by a bio- logical system, for a biological system. Human speech is the evolutionary product of the vocal and auditory sys- tems and not the other way around. The result shows a distinct lack of engineering common sense. As a matter of fact, psychophysical studies over the last number of decades tend to leave us with the uncomfortable feeling that the world perceived through our senses is rather different from the one that we measure with our instru- ments. We will now consider some revealing aspects of human auditory preception and then examine the mel- scaled cepstrum algorithm in order to draw some con- clusions.
26 citations
••
01 Sep 2009TL;DR: This paper proposes to use an evolution strategy to optimize the complementarity of two filter bank based feature extractors and shows that significant improvement can be obtained.
Abstract: Conventional automatic speaker verification systems are based on cepstral features like Mel-scale frequency cepstrum coefficient (MFCC), or linear predictive cepstrum coefficient (LPCC). Recent published works showed that the use of complementary features can significantly improve the system performances. In this paper, we propose to use an evolution strategy to optimize the complementarity of two filter bank based feature extractors. Experiments we made with a state of the art speaker verification system show that significant improvement can be obtained. Compared to the standard MFCC, an equal error rate (EER) improvement of 11.48% and 21.56% was obtained on the 2005 Nist SRE and Ntimit databases, respectively. Furthermore, the obtained filter banks picture out the importance of some specific spectral information for automatic speaker verification.
26 citations