A study on data augmentation of reverberant speech for robust speech recognition

doi:10.1109/ICASSP.2017.7953152

Proceedings ArticleDOI

A study on data augmentation of reverberant speech for robust speech recognition

Tom Ko, +4 more

- pp 5220-5224

Chats0

TLDR

It is found that the performance gap between using simulated and real RIRs can be eliminated when point-source noises are added, and the trained acoustic models not only perform well in the distant- talking scenario but also provide better results in the close-talking scenario.

Abstract:

The environmental robustness of DNN-based acoustic models can be significantly improved by using multi-condition training data. However, as data collection is a costly proposition, simulation of the desired conditions is a frequently adopted strategy. In this paper we detail a data augmentation approach for far-field ASR. We examine the impact of using simulated room impulse responses (RIRs), as real RIRs can be difficult to acquire, and also the effect of adding point-source noises. We find that the performance gap between using simulated and real RIRs can be eliminated when point-source noises are added. Further we show that the trained acoustic models not only perform well in the distant-talking scenario but also provide better results in the close-talking scenario. We evaluate our approach on several LVCSR tasks which can adequately represent both scenarios.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

X-Vectors: Robust DNN Embeddings for Speaker Recognition

David Snyder, +4 more

TL;DR: This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.

...read moreread less

Proceedings ArticleDOI

ECAPA-TDNN : Emphasized Channel Attention, Propagation and Aggregation in TDNN based speaker verification

Brecht Desplanques, +2 more

TL;DR: The proposed ECAPA-TDNN architecture significantly outperforms state-of-the-art TDNN based systems on the Voxceleb test sets and the 2019 VoxCeleb Speaker Recognition Challenge.

...read moreread less

Proceedings ArticleDOI

ASVspoof 2019: Future horizons in spoofed and fake audio detection

Massimiliano Todisco, +9 more

TL;DR: The 2019 database, protocols and challenge results are described, and major findings which demonstrate the real progress made in protecting against the threat of spoofing and fake audio are outlined.

...read moreread less

Proceedings ArticleDOI

Self-attentive Speaker Embeddings for Text-independent Speaker Verification

Yingke Zhu, +4 more

TL;DR: The proposed self-attentive speaker embedding system is compared with a strong DNN embedding baseline on NIST SRE 2016 and it is found that the self-ATTentive embeddings achieve superior performance.

...read moreread less

Journal ArticleDOI

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

Xin Wang, +40 more

- 01 Nov 2020 -

Computer Speech & Language

TL;DR: The ASVspoof challenge as mentioned in this paper was created to foster research on anti-spoofing and to provide common platforms for the assessment and comparison of spoofing countermeasures, and the first edition focused on replay spoofing attacks and countermeasures.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings ArticleDOI

Speech recognition with deep recurrent neural networks

Alex Graves, +2 more

TL;DR: This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.

...read moreread less

Proceedings Article

The Kaldi Speech Recognition Toolkit

Daniel Povey, +12 more

TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.

...read moreread less

Posted Content

Speech Recognition with Deep Recurrent Neural Networks

Alex Graves, +2 more

- 22 Mar 2013 -

arXiv: Neural and Evolutionary Computing

TL;DR: In this paper, deep recurrent neural networks (RNNs) are used to combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.

...read moreread less

Journal ArticleDOI

Image method for efficiently simulating small‐room acoustics

Jont B. Allen, +1 more

- 01 Nov 1976 -

Journal of the Acoustical Society of Ame...

TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.

...read moreread less

IEEE Transactions on Audio, Speech, and ...

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Daniel S. Park, +6 more

A study on data augmentation of reverberant speech for robust speech recognition

Citations

X-Vectors: Robust DNN Embeddings for Speaker Recognition

ECAPA-TDNN : Emphasized Channel Attention, Propagation and Aggregation in TDNN based speaker verification

ASVspoof 2019: Future horizons in spoofed and fake audio detection

Self-attentive Speaker Embeddings for Text-independent Speaker Verification

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

References

Long short-term memory

Speech recognition with deep recurrent neural networks

The Kaldi Speech Recognition Toolkit

Speech Recognition with Deep Recurrent Neural Networks

Image method for efficiently simulating small‐room acoustics

Related Papers (5)

The Kaldi Speech Recognition Toolkit

X-Vectors: Robust DNN Embeddings for Speaker Recognition

Librispeech: An ASR corpus based on public domain audio books

Front-End Factor Analysis for Speaker Verification

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition