Proceedings ArticleDOI
Audio Replay Attack Detection Using High-Frequency Features.
Reads0
Chats0
TLDR
This paper addresses a replay spoofing attack against a speaker recognition system by detecting that the analysed signal has passed through multiple analogue-to-digital conversions by modelling the subband spectrum and using the proposed features derived from the linear prediction analysis.Abstract:
This paper presents our contribution to the ASVspoof 2017 Challenge. It addresses a replay spoofing attack against a speaker recognition system by detecting that the analysed signal has passed through multiple analogue-to-digital (AD) conversions. Specifically, we show that most of the cues that enable to detect the replay attacks can be found in the high-frequency band of the replayed recordings. The described anti-spoofing countermeasures are based on (1) modelling the subband spectrum and (2) using the proposed features derived from the linear prediction (LP) analysis. The results of the investigated methods show a significant improvement in comparison to the baseline system of the ASVspoof 2017 Challenge. A relative equal error rate (EER) reduction by 70% was achieved for the development set and a reduction by 30% was obtained for the evaluation set.read more
Citations
More filters
Journal ArticleDOI
Advances in anti-spoofing: from the perspective of ASVspoof challenges
TL;DR: The literature review of ASV spoof detection, novel acoustic feature representations, deep learning, end-to-end systems, etc, along with recent efforts to develop countermeasures for spoof speech detection (SSD) task are presented.
Proceedings ArticleDOI
Effectiveness of Speech Demodulation-Based Features for Replay Detection.
TL;DR: This paper explores speech demodulation-based features using Hilbert transform (HT) and Teager Energy Operator (TEO) for replay detection and proposes features, namely, HT-based Instantaneous Amplitude (IA) and Instantaneous Frequency (IF) Cosine Coefficients and Energy Separation Algorithm (ESA) based features.
Proceedings ArticleDOI
A Light Convolutional GRU-RNN Deep Feature Extractor for ASV Spoofing Detection.
TL;DR: This work proposes the use of a Light Convolutional Gated Recurrent Neural Network (LC-GRNN) as a deep feature extractor to robustly represent speech signals as utterance-level embeddings, which are later used by a back-end recognizer which performs the final genuine/spoofed classification.
Proceedings ArticleDOI
Modulation dynamic features for the detection of replay attacks
TL;DR: This paper proposes two novel features to capture the static and dynamic characteristics of the signal from the modulation spectrum, which complement short term spectral features for use in replay detection.
Proceedings ArticleDOI
Long Range Acoustic and Deep Features Perspective on ASVspoof 2019
TL;DR: A comprehensive analysis on the nature of different kinds of spoofing attacks and system development is made and the use of deep features that enhances the discriminative ability between genuine and spoofed speech is investigated.
References
More filters
Posted Content
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Martín Abadi,Ashish Agarwal,Paul Barham,Eugene Brevdo,Zhifeng Chen,Craig Citro,Greg S. Corrado,Andy Davis,Jeffrey Dean,Matthieu Devin,Sanjay Ghemawat,Ian Goodfellow,Andrew Harp,Geoffrey Irving,Michael Isard,Yangqing Jia,Rafal Jozefowicz,Lukasz Kaiser,Manjunath Kudlur,Josh Levenberg,Dan Mané,Rajat Monga,Sherry Moore,Derek G. Murray,Chris Olah,Mike Schuster,Jonathon Shlens,Benoit Steiner,Ilya Sutskever,Kunal Talwar,Paul A. Tucker,Vincent Vanhoucke,Vijay K. Vasudevan,Fernanda B. Viégas,Oriol Vinyals,Pete Warden,Martin Wattenberg,Martin Wicke,Yuan Yu,Xiaoqiang Zheng +39 more
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Journal ArticleDOI
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
S. Davis,Paul Mermelstein +1 more
TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.
End to end speech recognition in English and Mandarin
Dario Amodei,Rishita Anubhai,Eric Battenberg,Carl Case,Jared Casper,Bryan Catanzaro,Jingdong Chen,Mike Chrzanowski,Adam Coates,Greg Diamos,Erich Elsen,Jesse Engel,Linxi Fan,Christopher Fougner,Tony X. Han,Awni Hannun,Billy Jun,Patrick LeGresley,Libby Lin,Sharan Narang,Andrew Y. Ng,Sherjil Ozair,Ryan Prenger,Jonathan Raiman,Sanjeev Satheesh,David Seetapun,Shubho Sengupta,Yi Wang,Zhiqian Wang,Chong Wang,Bo Xiao,Dani Yogatama,Jun Zhan,Zhenyao Zhu +33 more
TL;DR: It is shown that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech-two vastly different languages, and is competitive with the transcription of human workers when benchmarked on standard datasets.
Proceedings Article
Deep speech 2: end-to-end speech recognition in English and mandarin
Dario Amodei,Sundaram Ananthanarayanan,Rishita Anubhai,Jingliang Bai,Eric Battenberg,Carl Case,Jared Casper,Bryan Catanzaro,Qiang Cheng,Guoliang Chen,Jie Chen,Jingdong Chen,Zhijie Chen,Mike Chrzanowski,Adam Coates,Greg Diamos,Ke Ding,Niandong Du,Erich Elsen,Jesse Engel,Weiwei Fang,Linxi Fan,Christopher Fougner,Liang Gao,Caixia Gong,Awni Hannun,Tony X. Han,Lappi Vaino Johannes,Bing Jiang,Cai Ju,Billy Jun,Patrick LeGresley,Libby Lin,Junjie Liu,Yang Liu,Weigao Li,Xiangang Li,Dongpeng Ma,Sharan Narang,Andrew Y. Ng,Sherjil Ozair,Yiping Peng,Ryan Prenger,Sheng Qian,Zongfeng Quan,Jonathan Raiman,Vinay Rao,Sanjeev Satheesh,David Seetapun,Shubho Sengupta,Kavya Srinet,Anuroop Sriram,Haiyuan Tang,Liliang Tang,Chong Wang,Jidong Wang,Kaifu Wang,Yi Wang,Zhijian Wang,Zhiqian Wang,Shuang Wu,Likai Wei,Bo Xiao,Wen Xie,Yan Xie,Dani Yogatama,Bin Yuan,Jun Zhan,Zhenyao Zhu +68 more
TL;DR: In this article, an end-to-end deep learning approach was used to recognize either English or Mandarin Chinese speech-two vastly different languages-using HPC techniques, enabling experiments that previously took weeks to now run in days.
Journal ArticleDOI
Calculation of a constant Q spectral transform
TL;DR: In this article, a constant Q transform with a constant ratio of center frequency to resolution has been proposed to obtain a constant pattern in the frequency domain for sounds with harmonic frequency components.