Topic

Cepstrum

About: Cepstrum is a research topic. Over the lifetime, 3346 publications have been published within this topic receiving 55742 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

An overview of speech endpoint detection algorithms

[...]

Tao Zhang¹, Yangyang Shao¹, Yaqin Wu¹, Yanzhang Geng¹, Long Fan¹ - Show less +1 more•Institutions (1)

Tianjin University¹

01 Mar 2020-Applied Acoustics

TL;DR: An overview of the state-of-the-art in time domain, frequency domain and cepstrum domain for speech endpoint detection algorithms and to cast a glance at the challenges for future research is provided.

...read moreread less

27 citations

Journal Article•DOI•

Simplified inverse filter tracking algorithm for estimating the mean trabecular bone spacing

[...]

Kai Huang¹, Dean Ta¹, Weiqi Wang¹, Lawrence H. Le•Institutions (1)

Fudan University¹

15 Jul 2008-IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control

TL;DR: The study demonstrates that the SIFT algorithm has the potential to be a reliable and robust method for the estimation of MTBS in the presence of a small signal-to-noise ratio, a large spacing variation between regular scatterers, and a large scattering strength ratio of diffuse scatterer to regular ones.

...read moreread less

Abstract: Ultrasonic backscatter signals provide useful information relevant to bone tissue characterization Trabecular bone microstructures have been considered as quasi-periodic tissues with a collection of regular and diffuse scatterers This paper investigates the potential of a novel technique using a simplified inverse filter tracking (SIFT) algorithm to estimate mean trabecular bone spacing (MTBS) from ultrasonic backscatter signals In contrast to other frequency-based methods, the SIFT algorithm is a time-based method and utilizes the amplitude and phase information of backscatter echoes, thus retaining the advantages of both the autocorrelation and the cepstral analysis techniques The SIFT algorithm was applied to backscatter signals from simulations, phantoms, and bovine trabeculae in vitro The estimated MTBS results were compared with those of the autoregressive (AR) cepstrum and quadratic transformation (QT) The SIFT estimates are better than the AR cepstrum estimates and are comparable with the QT values The study demonstrates that the SIFT algorithm has the potential to be a reliable and robust method for the estimation of MTBS in the presence of a small signal-to-noise ratio, a large spacing variation between regular scatterers, and a large scattering strength ratio of diffuse scatterers to regular ones

...read moreread less

26 citations

Proceedings Article•DOI•

Replay detection using CQT-based modified group delay feature and ResNeWt network in ASVspoof 2019

[...]

Xingliang Cheng¹, Mingxing Xu¹, Thomas Fang Zheng¹•Institutions (1)

Tsinghua University¹

01 Nov 2019

TL;DR: A simple 2D-convolution multi-branch network architecture for replay detection, which can model the distortion both in the time and frequency domains and performance can be further improved by combining both magnitude-based and phase-based feature.

...read moreread less

Abstract: Automatic Speaker Verification (ASV) technology is vulnerable to various kinds of spoofing attacks, including speech synthesis, voice conversion, and replay. Among them, the replay attack is easy to implement, posing a more severe threat to ASV. The constant-Q cepstrum coefficient (CQCC) feature is effective for detecting the replay attacks, but it only utilizes the magnitude of constant-Q transform (CQT) and discards the phase information. Meanwhile, the commonly used Gaussian mixture model (GMM) cannot model the reverberation present in far-field recordings. In this paper, we incorporate the CQT and modified group delay function (MGD) in order to utilize the phase of CQT. Also, we present a simple 2D-convolution multi-branch network architecture for replay detection, which can model the distortion both in the time and frequency domains. The experiment shows that the proposed CQT-based MGD feature outperforms traditional MGD feature, and performance can be further improved by combining both magnitude-based and phase-based feature. Our best fusion system achieves 0.0096 min-tDCF and 0.39% EER on ASVspoof 2019 Physical Access evaluation set. Comparing with the CQCC-GMM baseline system provided by the organizer, the min-tDCF is relatively reduced by 96.09% and EER is relatively reduced by 96.46%. Our system is submitted to the ASVspoof 2019 Physical Access sub-challenge and won 1st place.

...read moreread less

26 citations

On The Mel-scaled Cepstrum

[...]

H. P. Combrinck¹, E. C. Botha•Institutions (1)

University of Pretoria¹

01 Jan 2000

TL;DR: Some revealing aspects of human auditory preception are considered and the mel- scaled cepstrum algorithm is examined in order to draw some con- clusions.

...read moreread less

Abstract: The mel-scaled cepstrum is a signal representation scheme used in the analysis of speech signals. Due to its reported superior performance, especially under adverse conditions, it is becoming an increasingly popular choice as feature extraction front end to spoken language systems. Having evolved over a pe- riod of more than fifty years, the mel-scaled cepstrum owes part of its heritage to the pattern recognition community and part to perceptual and acoustical research. It represents a good trade-off between computational efficiency and perceptual considerations. Unfortunately, maybe because of its hybrid nature, the literature tends to be vague on the implementation details of mel-scaled cep- strum algorithms. In this paper we clarify some of the issues re- garding the algorithm and its implementation. Our investigation also serves to expose some fundamental flaws remaining in the established approach to speech signal feature extraction. I. Introduction HE pre-processing and feature extraction stages of a pattern recognition system serves as an interface between the real world and a classifier operating on an idealised model of reality. Information that is discarded in this stage is forever lost; conversely, noise that is accepted will degrade the performance of the classifier stage that is typically sensitive to complexity in the data. The signals that spoken language systems have to deal with is unique in the sense that it is generated by a bio- logical system, for a biological system. Human speech is the evolutionary product of the vocal and auditory sys- tems and not the other way around. The result shows a distinct lack of engineering common sense. As a matter of fact, psychophysical studies over the last number of decades tend to leave us with the uncomfortable feeling that the world perceived through our senses is rather different from the one that we measure with our instru- ments. We will now consider some revealing aspects of human auditory preception and then examine the mel- scaled cepstrum algorithm in order to draw some con- clusions.

...read moreread less

26 citations

Journal Article•DOI•

Optimizing feature complementarity by evolution strategy: Application to automatic speaker verification

[...]

C. Charbuillet¹, Bruno Gas¹, Mohamed Chetouani¹, Jean-Luc Zarader¹•Institutions (1)

Pierre-and-Marie-Curie University¹

01 Sep 2009

TL;DR: This paper proposes to use an evolution strategy to optimize the complementarity of two filter bank based feature extractors and shows that significant improvement can be obtained.

...read moreread less

Abstract: Conventional automatic speaker verification systems are based on cepstral features like Mel-scale frequency cepstrum coefficient (MFCC), or linear predictive cepstrum coefficient (LPCC). Recent published works showed that the use of complementary features can significantly improve the system performances. In this paper, we propose to use an evolution strategy to optimize the complementarity of two filter bank based feature extractors. Experiments we made with a state of the art speaker verification system show that significant improvement can be obtained. Compared to the standard MFCC, an equal error rate (EER) improvement of 11.48% and 21.56% was obtained on the 2005 Nist SRE and Ntimit databases, respectively. Furthermore, the obtained filter banks picture out the importance of some specific spectral information for automatic speaker verification.

...read moreread less

26 citations

Collapse

Network Information

Performance

Metrics

3,645

Papers

60,375

Citations

No. of papers in the topic in previous years
Year	Papers
2023	86
2022	206
2021	60
2020	96
2019	135
2018	130

Cepstrum

Papers published on a yearly basis

Papers

Trending Questions (9)

Network Information

Related Topics (5)

Performance

Metrics