Acoustic event recognition using cochleagram image and convolutional neural networks

doi:10.1016/J.APACOUST.2018.12.006

Home
/
Papers
/
Acoustic event recognition using cochleagram image and convolutional neural networks

Journal Article•DOI•

Acoustic event recognition using cochleagram image and convolutional neural networks

Roneel V. Sharan¹, Tom Moir²•Institutions (2)

University of Queensland¹, Auckland University of Technology²

01 May 2019-Applied Acoustics (Elsevier)-Vol. 148, pp 62-66

TL;DR: This work evaluates the performance of four time-frequency representations for use with CNN and proposes the use of a cochleagram image the frequency components of which are based on the frequency selectivity property of the human cochLEa.

read less

About: This article is published in Applied Acoustics.The article was published on 2019-05-01. It has received 39 citations till now. The article focuses on the topics: Spectrogram & Convolutional neural network.

...read moreread less

Citations

PDF

Open Access

More filters

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Journal Article•DOI•

Multi-modal Sarcasm Detection and Humor Classification in Code-mixed Conversations

[...]

Manjot Bedi¹, Shivani Kumar¹, Shad Akhtar¹, Tanmoy Chakraborty¹•Institutions (1)

Indraprastha Institute of Information Technology¹

20 May 2021-arXiv: Computation and Language

TL;DR: In this paper, a Hindi-English code-mixed dataset, MaSaC, was developed for sarcasm detection and humor classification in conversational dialog, which to our knowledge is the first dataset of its kind.

...read moreread less

Abstract: Sarcasm detection and humor classification are inherently subtle problems, primarily due to their dependence on the contextual and non-verbal information. Furthermore, existing studies in these two topics are usually constrained in non-English languages such as Hindi, due to the unavailability of qualitative annotated datasets. In this work, we make two major contributions considering the above limitations: (1) we develop a Hindi-English code-mixed dataset, MaSaC, for the multi-modal sarcasm detection and humor classification in conversational dialog, which to our knowledge is the first dataset of its kind; (2) we propose MSH-COMICS, a novel attention-rich neural architecture for the utterance classification. We learn efficient utterance representation utilizing a hierarchical attention mechanism that attends to a small portion of the input sentence at a time. Further, we incorporate dialog-level contextual attention mechanism to leverage the dialog history for the multi-modal classification. We perform extensive experiments for both the tasks by varying multi-modal inputs and various submodules of MSH-COMICS. We also conduct comparative analysis against existing approaches. We observe that MSH-COMICS attains superior performance over the existing models by > 1 F1-score point for the sarcasm detection and 10 F1-score points in humor classification. We diagnose our model and perform thorough analysis of the results to understand the superiority and pitfalls.

...read moreread less

26 citations

Journal Article•DOI•

Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech.

[...]

Zhichao Peng¹, Zhichao Peng², Jianwu Dang¹, Jianwu Dang², Masashi Unoki², Masato Akagi² - Show less +2 more•Institutions (2)

Tianjin University¹, Japan Advanced Institute of Science and Technology²

25 Mar 2021-Neural Networks

TL;DR: In this paper, a novel feature called multi-resolution modulation-filtered cochleagram (MMCG) was proposed for predicting valence and arousal values of emotional primitives.

...read moreread less

18 citations

Journal Article•DOI•

Pecking activity detection in group-housed turkeys using acoustic data and a deep learning technique

[...]

Abozar Nasirahmadi¹, Jennifer J. Gonzalez¹, Barbara Sturm¹, Barbara Sturm², Oliver Hensel¹, Ute Knierim¹ - Show less +2 more•Institutions (2)

University of Kassel¹, Newcastle University²

01 Jun 2020-Biosystems Engineering

TL;DR: The experimental results illustrate that the deep learning-based detection method achieved high overall accuracy, precision, recall and F1-score of 96.8% in the detection phase, indicating that the proposed technique could be used as a precise method for the detection of pecking activity levels in turkeys.

...read moreread less

16 citations

Proceedings Article•DOI•

Epileptic Seizure Detection Using Multi-Channel EEG Wavelet Power Spectra and 1-D Convolutional Neural Networks

[...]

Roneel V. Sharan¹, Shlomo Berkovsky¹•Institutions (1)

Macquarie University¹

01 Jul 2020

TL;DR: This work aims to extend the conventional approach of epileptic seizure detection utilizing raw power spectra of EEG signals and convolutional neural networks (CNN) to compute the frequency characteristics of multi-channel EEG signals.

...read moreread less

Abstract: The use of feature extraction and selection from EEG signals has shown to be useful in the detection of epileptic seizure segments. However, these traditional methods have more recently been surpassed by deep learning techniques, forgoing the need for complex feature engineering. This work aims to extend the conventional approach of epileptic seizure detection utilizing raw power spectra of EEG signals and convolutional neural networks (CNN). The proposed technique utilizes wavelet transform to compute the frequency characteristics of multi-channel EEG signals. The EEG signals are divided into 2 second epochs and frequency spectrum up to a cutoff frequency of 45 Hz is computed. This multi-channel raw spectral data forms the input to a one-dimensional CNN (1-D CNN). Spectral data from the current, previous, and next epochs is utilized for predicting the label of the current epoch. The performance of the technique is evaluated using a dataset of EEG signals from 24 cases. The proposed method achieves an accuracy of 97.25% in detecting epileptic seizure segments. This result shows that multi-channel EEG wavelet power spectra and 1-D CNN are useful in detecting epileptic seizures.

...read moreread less

15 citations

Cites methods from "Acoustic event recognition using co..."

...Originally developed as an image classification method [9], it has been successfully applied to audio signal classification tasks [10]....
[...]

1
2
3
4
…
5
6
7
8

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Journal Article•DOI•

Support-Vector Networks

[...]

Corinna Cortes¹, Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

15 Sep 1995-Machine Learning

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

37,861 citations

Book•

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

17 Aug 2006

TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.

...read moreread less

22,840 citations

Proceedings Article•

Rectified Linear Units Improve Restricted Boltzmann Machines

[...]

Vinod Nair¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

21 Jun 2010

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

...read moreread less

Abstract: Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these "Stepped Sigmoid Units" are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors.

...read moreread less

14,799 citations

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

...read moreread less

10,141 citations