scispace - formally typeset
Open AccessJournal ArticleDOI

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Reads0
Chats0
TLDR
In this paper, a convolutional recurrent neural network (CRNN) was proposed for polyphonic sound event detection task and compared with CNN, RNN and other established methods, and observed a considerable improvement for four different datasets consisting of everyday sound events.
Abstract
Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks CNNs are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks RNNs are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a convolutional recurrent neural network CRNN and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.

read more

Citations
More filters
Journal ArticleDOI

Deep Learning for Audio Signal Processing

TL;DR: Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross fertilization between areas.

DCASE 2017 challenge setup: tasks, datasets and baseline system

TL;DR: This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset.
Journal ArticleDOI

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

TL;DR: The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
Journal ArticleDOI

Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge

TL;DR: The emergence of deep learning as the most popular classification method is observed, replacing the traditional approaches based on Gaussian mixture models and support vector machines.
Journal ArticleDOI

Classifying environmental sounds using image recognition networks

TL;DR: This paper considers the classification accuracy for different image representations (Spectrogram, MFCC, and CRP) of environmental sounds, and evaluates the accuracy for environmental sounds in three publicly available datasets, using two well-known convolutional deep neural networks for image recognition.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Journal ArticleDOI

Deep learning

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Related Papers (5)