Improving Reverberant Speech Training Using Diffuse Acoustic Simulation

doi:10.1109/ICASSP40776.2020.9052932

Open AccessProceedings ArticleDOI

Improving Reverberant Speech Training Using Diffuse Acoustic Simulation

- pp 6969-6973

TLDR

It is shown that by using the same synthetic training data, the same neural networks gain significant performance improvement on real test sets in far-field speech recognition by 1.58% and keyword spotting by 21%, without fine-tuning using real impulse responses.

Abstract:

We present an efficient and realistic geometric acoustic simulation approach for generating and augmenting training data in speech-related machine learning tasks. Our physically-based acoustic simulation method is capable of modeling occlusion, specular and diffuse reflections of sound in complicated acoustic environments, whereas the classical image method can only model specular reflections in simple room settings. We show that by using our synthetic training data, the same neural networks gain significant performance improvement on real test sets in far-field speech recognition by 1.58% and keyword spotting by 21%, without fine-tuning using real impulse responses.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks.

Zhenyu Tang, +3 more

TL;DR: A novel learning-based approach to estimate the direction- of-arrival (DOA) of a sound source using a convolutional recurrent neural network trained via regression on synthetic data and Cartesian labels and an improved method to generate synthetic data to train the neural network using state-of-the-art sound propagation algorithms that model specular as well as diffuse reflections of sound.

...read moreread less

Journal ArticleDOI

Scene-Aware Audio Rendering via Deep Acoustic Analysis

Zhenyu Tang, +4 more

- 13 Feb 2020 -

IEEE Transactions on Visualization and C...

TL;DR: A new method to capture the acoustic characteristics of real-world rooms using commodity devices, and use the captured characteristics to generate similar sounding sources with virtual models, based on deep neural networks.

...read moreread less

Posted Content

Sound Synthesis, Propagation, and Rendering: A Survey.

Shiguang Liu, +1 more

- 11 Nov 2020 -

arXiv: Sound

TL;DR: This paper gives a broad overview of research works on sound simulation in virtual reality, games, multimedia, computer-aided design, and points to some future directions of this field.

...read moreread less

Proceedings ArticleDOI

Low-Frequency Compensated Synthetic Impulse Responses For Improved Far-Field Speech Recognition

Zhenyu Tang, +2 more

TL;DR: This paper proposed a method for generating low-frequency compensated synthetic impulse responses that improve the performance of far-field speech recognition systems trained on artificially augmented datasets, which can reduce the word-error-rate by up to 8.8%.

...read moreread less

Proceedings ArticleDOI

GWA: A Large High-Quality Acoustic Dataset for Audio Processing

Zhenyu Tang, +3 more

TL;DR: The Geometric-Wave Acoustic (GWA) dataset is presented, a large-scale audio dataset of about 2 million synthetic room impulse responses (IRs) and their corresponding detailed geometric and simulation configurations that is the first data with accurate wave acoustic simulations in complex scenes.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton, +10 more

- 18 Oct 2012 -

IEEE Signal Processing Magazine

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

Journal ArticleDOI

Image method for efficiently simulating small‐room acoustics

Jont B. Allen, +1 more

- 01 Nov 1976 -

Journal of the Acoustical Society of Ame...

TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.

...read moreread less

Journal ArticleDOI

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

George E. Dahl, +3 more

- 01 Jan 2012 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.

...read moreread less

Journal ArticleDOI

The rendering equation

James T. Kajiya

TL;DR: An integral equation is presented which generalizes a variety of known rendering algorithms and a new form of variance reduction, called Hierarchical sampling, which may be an efficient new technique for a wide variety of monte carlo procedures.

...read moreread less

Journal ArticleDOI

Convolutional neural networks for speech recognition

Ossama Abdel-Hamid, +5 more

- 01 Oct 2014 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.

...read moreread less