Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

doi:10.1109/TASLP.2017.2690575

Open AccessJournal ArticleDOI

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Emre Cakir, +4 more

- 01 Jun 2017 -

IEEE Transactions on Audio, Speech, and ...

- Vol. 25, Iss: 6, pp 1291-1303

Chats0

TLDR

In this paper, a convolutional recurrent neural network (CRNN) was proposed for polyphonic sound event detection task and compared with CNN, RNN and other established methods, and observed a considerable improvement for four different datasets consisting of everyday sound events.

Abstract:

Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks CNNs are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks RNNs are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a convolutional recurrent neural network CRNN and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep Learning for Audio Signal Processing

Hendrik Purwins, +5 more

- 01 Apr 2019 -

IEEE Journal of Selected Topics in Signa...

TL;DR: Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross fertilization between areas.

...read moreread less

DCASE 2017 challenge setup: tasks, datasets and baseline system

Annamaria Mesaros, +7 more

TL;DR: This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset.

...read moreread less

Journal ArticleDOI

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

Sharath Adavanne, +3 more

- 01 Mar 2019 -

IEEE Journal of Selected Topics in Signa...

TL;DR: The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.

...read moreread less

Journal ArticleDOI

Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge

Annamaria Mesaros, +6 more

- 01 Feb 2018 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: The emergence of deep learning as the most popular classification method is observed, replacing the traditional approaches based on Gaussian mixture models and support vector machines.

...read moreread less

Journal ArticleDOI

Classifying environmental sounds using image recognition networks

Venkatesh Boddapati, +3 more

TL;DR: This paper considers the classification accuracy for different image representations (Spectrogram, MFCC, and CRP) of environmental sounds, and evaluates the accuracy for environmental sounds in three publicly available datasets, using two well-known convolutional deep neural networks for image recognition.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Journal ArticleDOI

Deep learning

Yann LeCun, +4 more

- 28 May 2015 -

Nature

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

...read moreread less

Collapse

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Citations

Deep Learning for Audio Signal Processing

DCASE 2017 challenge setup: tasks, datasets and baseline system

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge

Classifying environmental sounds using image recognition networks

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

ImageNet Classification with Deep Convolutional Neural Networks

Long short-term memory

Deep learning

Related Papers (5)

Adam: A Method for Stochastic Optimization

TUT database for acoustic scene classification and sound event detection

Audio Set: An ontology and human-labeled dataset for audio events

Deep Residual Learning for Image Recognition

CNN architectures for large-scale audio classification