scispace - formally typeset
Open AccessProceedings ArticleDOI

Improving Reverberant Speech Training Using Diffuse Acoustic Simulation

TLDR
It is shown that by using the same synthetic training data, the same neural networks gain significant performance improvement on real test sets in far-field speech recognition by 1.58% and keyword spotting by 21%, without fine-tuning using real impulse responses.
Abstract
We present an efficient and realistic geometric acoustic simulation approach for generating and augmenting training data in speech-related machine learning tasks. Our physically-based acoustic simulation method is capable of modeling occlusion, specular and diffuse reflections of sound in complicated acoustic environments, whereas the classical image method can only model specular reflections in simple room settings. We show that by using our synthetic training data, the same neural networks gain significant performance improvement on real test sets in far-field speech recognition by 1.58% and keyword spotting by 21%, without fine-tuning using real impulse responses.

read more

Citations
More filters
Proceedings ArticleDOI

Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks.

TL;DR: A novel learning-based approach to estimate the direction- of-arrival (DOA) of a sound source using a convolutional recurrent neural network trained via regression on synthetic data and Cartesian labels and an improved method to generate synthetic data to train the neural network using state-of-the-art sound propagation algorithms that model specular as well as diffuse reflections of sound.
Journal ArticleDOI

Scene-Aware Audio Rendering via Deep Acoustic Analysis

TL;DR: A new method to capture the acoustic characteristics of real-world rooms using commodity devices, and use the captured characteristics to generate similar sounding sources with virtual models, based on deep neural networks.
Posted Content

Sound Synthesis, Propagation, and Rendering: A Survey.

TL;DR: This paper gives a broad overview of research works on sound simulation in virtual reality, games, multimedia, computer-aided design, and points to some future directions of this field.
Proceedings ArticleDOI

Low-Frequency Compensated Synthetic Impulse Responses For Improved Far-Field Speech Recognition

TL;DR: This paper proposed a method for generating low-frequency compensated synthetic impulse responses that improve the performance of far-field speech recognition systems trained on artificially augmented datasets, which can reduce the word-error-rate by up to 8.8%.
Proceedings ArticleDOI

GWA: A Large High-Quality Acoustic Dataset for Audio Processing

TL;DR: The Geometric-Wave Acoustic (GWA) dataset is presented, a large-scale audio dataset of about 2 million synthetic room impulse responses (IRs) and their corresponding detailed geometric and simulation configurations that is the first data with accurate wave acoustic simulations in complex scenes.
References
More filters
Journal ArticleDOI

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Journal ArticleDOI

Image method for efficiently simulating small‐room acoustics

TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.
Journal ArticleDOI

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

TL;DR: A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
Journal ArticleDOI

The rendering equation

TL;DR: An integral equation is presented which generalizes a variety of known rendering algorithms and a new form of variance reduction, called Hierarchical sampling, which may be an efficient new technique for a wide variety of monte carlo procedures.
Journal ArticleDOI

Convolutional neural networks for speech recognition

TL;DR: It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.
Related Papers (5)