Proceedings ArticleDOI
Reverberation robust acoustic modeling using i-vectors with time delay neural networks.
Vijayaditya Peddinti,Guoguo Chen,Daniel Povey,Sanjeev Khudanpur +3 more
- pp 2440-2444
TLDR
iVectors are used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 10% relative improvement in word error rate, and subsampling the outputs at TDNN layers across time steps, training time is reduced.Abstract:
In reverberant environments there are long term interactions between speech and corrupting sources. In this paper a time delay neural network (TDNN) architecture, capable of learning long term temporal relationships and translation invariant representations, is used for reverberation robust acoustic modeling. Further, iVectors are used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 10% relative improvement in word error rate. By subsampling the outputs at TDNN layers across time steps, training time is reduced. Using a parallel training algorithm we show that the TDNN can be trained on ∼ 5500 hours of speech data in 3 days using up to 32 GPUs. The TDNN is shown to provide results competitive with state of the art systems in the IARPA ASpIRE challenge, with 27.7% WER on the dev test set.read more
Citations
More filters
Journal ArticleDOI
An unsupervised deep domain adaptation approach for robust speech recognition
TL;DR: An unsupervised deep domain adaptation (DDA) approach to acoustic modeling is introduced in order to eliminate the training–testing mismatch that is common in real-world use of speech recognition.
Proceedings ArticleDOI
Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting.
Ming Sun,David Snyder,Yixin Gao,Varun K. Nagaraja,Mike Rodehorst,Sankaran Panchapagesan,Nikko Strom,Spyros Matsoukas,Shiv Naga Prasad Vitaladevuni +8 more
TL;DR: This paper proposes to apply singular value decomposition (SVD) to further reduce TDNN complexity, and results show that the full-rank TDNN achieves a 19.7% DET AUC reduction compared to a similar-size deep neural network baseline.
Proceedings ArticleDOI
JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS
TL;DR: This paper tackles the problem of reverberant speech recognition using 5500 hours of simulated reverberant data using time-delay neural network (TDNN) architecture, which is capable of tackling long-term interactions between speech and corrupting sources in reverberant environments.
Proceedings ArticleDOI
Deep-FSMN for Large Vocabulary Continuous Speech Recognition
TL;DR: DFSMN as mentioned in this paper introduces skip connections between memory blocks in adjacent layers, which enable the information flow across different layers and thus alleviate the gradient vanishing problem when building very deep structure.
Proceedings ArticleDOI
Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning.
TL;DR: A multi-task architecture that jointly learns an accent classifier and a multi-accent acoustic model is proposed and augmenting the speech input with accent information in the form of embeddings extracted by a separate network is considered.
References
More filters
Proceedings Article
SRILM – An Extensible Language Modeling Toolkit
TL;DR: The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools.
Journal ArticleDOI
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
S. Davis,Paul Mermelstein +1 more
TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.
Journal ArticleDOI
Front-End Factor Analysis for Speaker Verification
TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Book
Phoneme recognition using time-delay neural networks
TL;DR: The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation.
Journal ArticleDOI
Phoneme recognition using time-delay neural networks
TL;DR: In this article, the authors presented a time-delay neural network (TDNN) approach to phoneme recognition, which is characterized by two important properties: (1) using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation; and (2) the time delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independently of position in time and therefore not blurred by temporal shifts in the input