Reverberation robust acoustic modeling using i-vectors with time delay neural networks.

doi:10.21437/INTERSPEECH.2015-527

Proceedings ArticleDOI

Reverberation robust acoustic modeling using i-vectors with time delay neural networks.

- pp 2440-2444

TLDR

iVectors are used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 10% relative improvement in word error rate, and subsampling the outputs at TDNN layers across time steps, training time is reduced.

Abstract:

In reverberant environments there are long term interactions between speech and corrupting sources. In this paper a time delay neural network (TDNN) architecture, capable of learning long term temporal relationships and translation invariant representations, is used for reverberation robust acoustic modeling. Further, iVectors are used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 10% relative improvement in word error rate. By subsampling the outputs at TDNN layers across time steps, training time is reduced. Using a parallel training algorithm we show that the TDNN can be trained on ∼ 5500 hours of speech data in 3 days using up to 32 GPUs. The TDNN is shown to provide results competitive with state of the art systems in the IARPA ASpIRE challenge, with 27.7% WER on the dev test set.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

An unsupervised deep domain adaptation approach for robust speech recognition

Sining Sun, +3 more

- 27 Sep 2017 -

Neurocomputing

TL;DR: An unsupervised deep domain adaptation (DDA) approach to acoustic modeling is introduced in order to eliminate the training–testing mismatch that is common in real-world use of speech recognition.

...read moreread less

Proceedings ArticleDOI

Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting.

Ming Sun, +8 more

TL;DR: This paper proposes to apply singular value decomposition (SVD) to further reduce TDNN complexity, and results show that the full-rank TDNN achieves a 19.7% DET AUC reduction compared to a similar-size deep neural network baseline.

...read moreread less

Proceedings ArticleDOI

JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS

Vijayaditya Peddinti, +5 more

TL;DR: This paper tackles the problem of reverberant speech recognition using 5500 hours of simulated reverberant data using time-delay neural network (TDNN) architecture, which is capable of tackling long-term interactions between speech and corrupting sources in reverberant environments.

...read moreread less

Proceedings ArticleDOI

Deep-FSMN for Large Vocabulary Continuous Speech Recognition

Shiliang Zhang, +3 more

TL;DR: DFSMN as mentioned in this paper introduces skip connections between memory blocks in adjacent layers, which enable the information flow across different layers and thus alleviate the gradient vanishing problem when building very deep structure.

...read moreread less

Proceedings ArticleDOI

Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning.

Abhinav Jain, +2 more

TL;DR: A multi-task architecture that jointly learns an accent classiﬁer and a multi-accent acoustic model is proposed and augmenting the speech input with accent information in the form of embeddings extracted by a separate network is considered.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

SRILM – An Extensible Language Modeling Toolkit

Andreas Stolcke

TL;DR: The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools.

...read moreread less

Journal ArticleDOI

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

S. Davis, +1 more

- 01 Aug 1980 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.

...read moreread less

Journal ArticleDOI

Front-End Factor Analysis for Speaker Verification

Najim Dehak, +4 more

- 01 May 2011 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.

...read moreread less

Book

Phoneme recognition using time-delay neural networks

Alex Waibel, +4 more

TL;DR: The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation.

...read moreread less

Journal ArticleDOI

Phoneme recognition using time-delay neural networks

Alex Waibel, +4 more

- 01 Mar 1989 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: In this article, the authors presented a time-delay neural network (TDNN) approach to phoneme recognition, which is characterized by two important properties: (1) using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation; and (2) the time delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independently of position in time and therefore not blurred by temporal shifts in the input

...read moreread less