Proceedings ArticleDOI
A study on data augmentation of reverberant speech for robust speech recognition
Tom Ko,Vijayaditya Peddinti,Daniel Povey,Michael L. Seltzer,Sanjeev Khudanpur +4 more
- pp 5220-5224
Reads0
Chats0
TLDR
It is found that the performance gap between using simulated and real RIRs can be eliminated when point-source noises are added, and the trained acoustic models not only perform well in the distant- talking scenario but also provide better results in the close-talking scenario.Abstract:
The environmental robustness of DNN-based acoustic models can be significantly improved by using multi-condition training data. However, as data collection is a costly proposition, simulation of the desired conditions is a frequently adopted strategy. In this paper we detail a data augmentation approach for far-field ASR. We examine the impact of using simulated room impulse responses (RIRs), as real RIRs can be difficult to acquire, and also the effect of adding point-source noises. We find that the performance gap between using simulated and real RIRs can be eliminated when point-source noises are added. Further we show that the trained acoustic models not only perform well in the distant-talking scenario but also provide better results in the close-talking scenario. We evaluate our approach on several LVCSR tasks which can adequately represent both scenarios.read more
Citations
More filters
Proceedings ArticleDOI
X-Vectors: Robust DNN Embeddings for Speaker Recognition
TL;DR: This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.
Proceedings ArticleDOI
ECAPA-TDNN : Emphasized Channel Attention, Propagation and Aggregation in TDNN based speaker verification
TL;DR: The proposed ECAPA-TDNN architecture significantly outperforms state-of-the-art TDNN based systems on the Voxceleb test sets and the 2019 VoxCeleb Speaker Recognition Challenge.
Proceedings ArticleDOI
ASVspoof 2019: Future horizons in spoofed and fake audio detection
Massimiliano Todisco,Xin Wang,Ville Vestman,Sahidullah,Héctor Delgado,Andreas Nautsch,Junichi Yamagishi,Nicholas Evans,Tomi Kinnunen,Kong Aik Lee +9 more
TL;DR: The 2019 database, protocols and challenge results are described, and major findings which demonstrate the real progress made in protecting against the threat of spoofing and fake audio are outlined.
Proceedings ArticleDOI
Self-attentive Speaker Embeddings for Text-independent Speaker Verification
TL;DR: The proposed self-attentive speaker embedding system is compared with a strong DNN embedding baseline on NIST SRE 2016 and it is found that the self-ATTentive embeddings achieve superior performance.
Journal ArticleDOI
ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech
Xin Wang,Junichi Yamagishi,Junichi Yamagishi,Massimiliano Todisco,Héctor Delgado,Andreas Nautsch,Nicholas Evans,Sahidullah,Ville Vestman,Tomi Kinnunen,Kong Aik Lee,Lauri Juvela,Paavo Alku,Yu-Huai Peng,Hsin-Te Hwang,Yu Tsao,Hsin-Min Wang,Sébastien Le Maguer,Markus Becker,Fergus Henderson,Robert A. J. Clark,Yu Zhang,Quan Wang,Ye Jia,Kai Onuma,Koji Mushika,Takashi Kaneda,Yuan Jiang,Li-Juan Liu,Yi-Chiao Wu,Wen-Chin Huang,Tomoki Toda,Kou Tanaka,Hirokazu Kameoka,Ingmar Steiner,Driss Matrouf,Jean-François Bonastre,Avashna Govender,Srikanth Ronanki,Jing-Xuan Zhang,Zhen-Hua Ling +40 more
TL;DR: The ASVspoof challenge as mentioned in this paper was created to foster research on anti-spoofing and to provide common platforms for the assessment and comparison of spoofing countermeasures, and the first edition focused on replay spoofing attacks and countermeasures.
References
More filters
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings ArticleDOI
Speech recognition with deep recurrent neural networks
TL;DR: This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Proceedings Article
The Kaldi Speech Recognition Toolkit
Daniel Povey,Arnab Ghoshal,Gilles Boulianne,Lukas Burget,Ondrej Glembek,Nagendra Kumar Goel,Mirko Hannemann,Petr Motlicek,Yanmin Qian,Petr Schwarz,Jan Silovsky,Georg Stemmer,Karel Vesely +12 more
TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Posted Content
Speech Recognition with Deep Recurrent Neural Networks
TL;DR: In this paper, deep recurrent neural networks (RNNs) are used to combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Journal ArticleDOI
Image method for efficiently simulating small‐room acoustics
Jont B. Allen,David A. Berkley +1 more
TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.