scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Subband Analysis for Performance Improvement of Replay Attack Detection in Speaker Verification Systems

TL;DR: The experimental results suggest that the features extracted from the high frequency band carries significant discriminatory information for replay attack detection, and the subband analysis on constant-Q cepstral coefficient (CQCC) and mel-frequency cepstal coefficient (MFCC) features to improve the performance of Replay attack detection.
Abstract: Automatic speaker verification systems have been widely employed in a variety of commercial applications. However, advancements in the field of speech technology have equipped the attackers with sophisticated techniques for circumventing speaker verification systems. The state-of-the-art countermeasures are fairly successful in detecting speech synthesis and voice conversion attacks. However, the problem of replay attack detection has not received much attention from the researchers. In this study, we perform subband analysis on constant-Q cepstral coefficient (CQCC) and mel-frequency cepstral coefficient (MFCC) features to improve the performance of replay attack detection. We have performed experiments on the ASVspoof 2017 database which consists of 3566 genuine and 15380 replay utterances. Our experimental results suggest that the features extracted from the high frequency band carries significant discriminatory information for replay attack detection. In particular, our approach achieves an improvement of 36.33% over the baseline replay attack detection method in terms of equal error rate.
Citations
More filters
Proceedings ArticleDOI
30 Aug 2021
TL;DR: The dualband fusion anti-spoofing algorithm is proposed, which requires only two sub-systems but outperforms all but one primary system submitted to the logical access condition of the ASVspoof 2019 challenge.
Abstract: The current neural network based anti-spoofing systems have poor robustness. Their performance degrades further after voice activity detection (VAD) performed, making it difficult to be applied in practice. This work investigated the effect of silence at the beginning and end of speech, finding that silent differences are part of the basis for countermeasures’ judgements. The reason for the performance deterioration caused by VAD is also explored. The experimental results demonstrate that the neural network loses the information about silent segments after the VAD operation removes them. This can lead to more serious overfitting. In order to solve the overfitting problem, the work in this paper also analyzes the reasons for system overfitting from different frequency sub-bands. It is found that the highfrequency part of the feature is the main cause of system overfitting, while the low-frequency part is more robust but less accurate against known attacks. Therefore, we propose the dualband fusion anti-spoofing algorithm, which requires only two sub-systems but outperforms all but one primary system submitted to the logical access condition of the ASVspoof 2019 challenge. Our system has an EER of 3.50% even after VAD operations performed, thus can be put into practical application.

36 citations

Proceedings ArticleDOI
01 Nov 2020
TL;DR: In this article, the authors propose a joint subband modeling framework that employs n different sub-networks to learn subband specific features, which are later combined and passed to a classifier.
Abstract: Spectrograms - time-frequency representations of audio signals - have found widespread use in neural network-based spoofing detection. While deep models are trained on the fullband spectrum of the signal, we argue that not all frequency bands are useful for these tasks. In this paper, we systematically investigate the impact of different subbands and their importance on replay spoofing detection on two benchmark datasets: ASVspoof 2017 v2.0 and ASVspoof 2019 PA. We propose a joint subband modelling framework that employs n different sub-networks to learn subband specific features. These are later combined and passed to a classifier and the whole network weights are updated during training. Our findings on the ASVspoof 2017 dataset suggest that the most discriminative information appears to be in the first and the last 1 kHz frequency bands, and the joint model trained on these two subbands shows the best performance outperforming the baselines by a large margin. However, these findings do not generalise on the ASVspoof 2019 PA dataset. This suggests that the datasets available for training these models do not reflect real world replay conditions suggesting a need for careful design of datasets for training replay spoofing countermeasures.

10 citations

Posted Content
TL;DR: In this paper, a bank of very simple classifiers, each with a front-end tuned to the detection of different spoofing attacks, are combined at the score level through non-linear fusion.
Abstract: The threat of spoofing can pose a risk to the reliability of automatic speaker verification. Results from the bi-annual ASVspoof evaluations show that effective countermeasures demand front-ends designed specifically for the detection of spoofing artefacts. Given the diversity in spoofing attacks, ensemble methods are particularly effective. The work in this paper shows that a bank of very simple classifiers, each with a front-end tuned to the detection of different spoofing attacks and combined at the score level through non-linear fusion, can deliver superior performance than more sophisticated ensemble solutions that rely upon complex neural network architectures. Our comparatively simple approach outperforms all but 2 of the 48 systems submitted to the logical access condition of the most recent ASVspoof 2019 challenge.

6 citations

Posted Content
TL;DR: The findings on the ASVspoof 2017 dataset suggest that the most discriminative information appears to be in the first and the last 1 kHz frequency bands, and the joint model trained on these two subbands shows the best performance outperforming the baselines by a large margin.
Abstract: Spectrograms - time-frequency representations of audio signals - have found widespread use in neural network-based spoofing detection. While deep models are trained on the fullband spectrum of the signal, we argue that not all frequency bands are useful for these tasks. In this paper, we systematically investigate the impact of different subbands and their importance on replay spoofing detection on two benchmark datasets: ASVspoof 2017 v2.0 and ASVspoof 2019 PA. We propose a joint subband modelling framework that employs n different sub-networks to learn subband specific features. These are later combined and passed to a classifier and the whole network weights are updated during training. Our findings on the ASVspoof 2017 dataset suggest that the most discriminative information appears to be in the first and the last 1 kHz frequency bands, and the joint model trained on these two subbands shows the best performance outperforming the baselines by a large margin. However, these findings do not generalise on the ASVspoof 2019 PA dataset. This suggests that the datasets available for training these models do not reflect real world replay conditions suggesting a need for careful design of datasets for training replay spoofing countermeasures.

5 citations


Cites background or methods from "Subband Analysis for Performance Im..."

  • ...In the context of spoofing detection, the most relevant studies include [19, 20, 21, 22, 23, 24]....

    [...]

  • ...Our current work is different from prior works [19, 20, 21, 22, 23] because most of them aim at handcrafting or learning features [24] based on the relevance of specific subbands for spoofing detection....

    [...]

  • ...The authors of [21] similarly performed subband analysis using CQCC and MFCC features for replay spoofing detection on the ASVspoof 2017 dataset, with similar findings to those in [20]....

    [...]

  • ...Our proposed work is different from prior works [19, 20, 21, 22, 23] because most of them aim at hand-crafting or learning features [24] based on the relevance of specific subbands for spoofing detection....

    [...]

Proceedings ArticleDOI
21 May 2021
TL;DR: In this article, the authors present the contribution of researchers to the detection of various spoofing attacks in automatic speaker verification (ASV) system and present several approaches for detecting replay attack.
Abstract: The automatic Speaker Verification (ASV) system has gained popularity due to its application in the area of biometrics. ASV system’s performance deteriorates because of spoofing attacks. In this paper, types of spoofing attacks are presented. The performance of ASV system depends on the front-end processing which includes extraction of features. This paper presents the contribution of researchers to the detection of various spoofing attacks. The research community has drawn attention to that among the various spoofing attacks that affect the ASV systems, a replay attack is challenging. In this paper, recent approaches for the detection of replay attack are presented.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, a constant Q transform with a constant ratio of center frequency to resolution has been proposed to obtain a constant pattern in the frequency domain for sounds with harmonic frequency components.
Abstract: The frequencies that have been chosen to make up the scale of Western music are geometrically spaced. Thus the discrete Fourier transform (DFT), although extremely efficient in the fast Fourier transform implementation, yields components which do not map efficiently to musical frequencies. This is because the frequency components calculated with the DFT are separated by a constant frequency difference and with a constant resolution. A calculation similar to a discrete Fourier transform but with a constant ratio of center frequency to resolution has been made; this is a constant Q transform and is equivalent to a 1/24‐oct filter bank. Thus there are two frequency components for each musical note so that two adjacent notes in the musical scale played simultaneously can be resolved anywhere in the musical frequency range. This transform against log (frequency) to obtain a constant pattern in the frequency domain for sounds with harmonic frequency components has been plotted. This is compared to the conventio...

890 citations


"Subband Analysis for Performance Im..." refers methods in this paper

  • ...CQCC features are computed using the constant-Q transform (CQT) [24], followed by uniform resampling and discrete cosine transform (DCT) [7]....

    [...]

Journal ArticleDOI
TL;DR: Unlocking the full potential of biometrics through inter-disciplinary research in the above areas will not only lead to widespread adoption of this promising technology, but will also result in wider user acceptance and societal impact.

541 citations


"Subband Analysis for Performance Im..." refers background in this paper

  • ...These attacks can be carried out at several points in the biometric processing pipeline [3]....

    [...]

Proceedings ArticleDOI
20 Aug 2017
TL;DR: ASVspoof 2017, the second in the series, focused on the development of replay attack countermeasures and indicates that the quest for countermeasures which are resilient in the face of variable replay attacks remains very much alive.
Abstract: The ASVspoof initiative was created to promote the development of countermeasures which aim to protect automatic speaker verification (ASV) from spoofing attacks. The first community-led, common evaluation held in 2015 focused on countermeasures for speech synthesis and voice conversion spoofing attacks. Arguably, however, it is replay attacks which pose the greatest threat. Such attacks involve the replay of recordings collected from enrolled speakers in order to provoke false alarms and can be mounted with greater ease using everyday consumer devices. ASVspoof 2017, the second in the series, hence focused on the development of replay attack countermeasures. This paper describes the database, protocols and initial findings. The evaluation entailed highly heterogeneous acoustic recording and replay conditions which increased the equal error rate (EER) of a baseline ASV system from 1.76% to 31.46%. Submissions were received from 49 research teams, 20 of which improved upon a baseline replay spoofing detector EER of 24.77%, in terms of replay/non-replay discrimination. While largely successful, the evaluation indicates that the quest for countermeasures which are resilient in the face of variable replay attacks remains very much alive.

435 citations

Journal ArticleDOI
TL;DR: A survey of past work and priority research directions for the future is provided, showing that future research should address the lack of standard datasets and the over-fitting of existing countermeasures to specific, known spoofing attacks.

433 citations


"Subband Analysis for Performance Im..." refers methods in this paper

  • ...Further, the logarithm of energies followed by DCT operation is performed to compute the cepstral coefficients....

    [...]

  • ...They can be circumvented using the four methods namely, voice conversion, speech synthesis, voice mimicry and audio replay [7]....

    [...]

  • ...CQCC features are computed using the constant-Q transform (CQT) [24], followed by uniform resampling and discrete cosine transform (DCT) [7]....

    [...]

01 Jan 2014
TL;DR: In this paper, the authors provide a survey of spoofing countermeasures for automatic speaker verificati on, highlighting the need for more effort in the future to ensure adequate protection against spoofing attacks.
Abstract: While biometric authentication has advanced significantly in recent years, evidence shows the technology can be susceptible to malicious spoofing attacks. The research community has resp onded with dedicated countermeasures which aim to detect and deflect such attacks. Even if the literature shows that they can be effective, the problem is far from being solved; biometric systems remain vulnerable to spoofing. Despite a growing momentum to develo p spoofing countermeasures for automatic speaker verificati on, now that the technology has matured suffi ciently to support mass deployment in an array of diverse applications, greater effort will be needed in the future to ensure adequate protection against spoofing. This article provides a survey of past work and ide ntifies priority research directions for the future. We summarise previous studies involving impersonation, replay, speech synthesis and voice conversion spoofing attacks and more recent e fforts to develop dedicated countermeasures. The survey shows that future research should address the lack of standard datasets and the over-fitting of existing countermeasures to specific, know n spoofing attacks.

371 citations