scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Acoustic event recognition using cochleagram image and convolutional neural networks

01 May 2019-Applied Acoustics (Elsevier)-Vol. 148, pp 62-66
TL;DR: This work evaluates the performance of four time-frequency representations for use with CNN and proposes the use of a cochleagram image the frequency components of which are based on the frequency selectivity property of the human cochLEa.
About: This article is published in Applied Acoustics.The article was published on 2019-05-01. It has received 39 citations till now. The article focuses on the topics: Spectrogram & Convolutional neural network.
Citations
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: In this paper, a Hindi-English code-mixed dataset, MaSaC, was developed for sarcasm detection and humor classification in conversational dialog, which to our knowledge is the first dataset of its kind.
Abstract: Sarcasm detection and humor classification are inherently subtle problems, primarily due to their dependence on the contextual and non-verbal information. Furthermore, existing studies in these two topics are usually constrained in non-English languages such as Hindi, due to the unavailability of qualitative annotated datasets. In this work, we make two major contributions considering the above limitations: (1) we develop a Hindi-English code-mixed dataset, MaSaC, for the multi-modal sarcasm detection and humor classification in conversational dialog, which to our knowledge is the first dataset of its kind; (2) we propose MSH-COMICS, a novel attention-rich neural architecture for the utterance classification. We learn efficient utterance representation utilizing a hierarchical attention mechanism that attends to a small portion of the input sentence at a time. Further, we incorporate dialog-level contextual attention mechanism to leverage the dialog history for the multi-modal classification. We perform extensive experiments for both the tasks by varying multi-modal inputs and various submodules of MSH-COMICS. We also conduct comparative analysis against existing approaches. We observe that MSH-COMICS attains superior performance over the existing models by > 1 F1-score point for the sarcasm detection and 10 F1-score points in humor classification. We diagnose our model and perform thorough analysis of the results to understand the superiority and pitfalls.

26 citations

Journal ArticleDOI
TL;DR: In this paper, a novel feature called multi-resolution modulation-filtered cochleagram (MMCG) was proposed for predicting valence and arousal values of emotional primitives.

18 citations

Journal ArticleDOI
TL;DR: The experimental results illustrate that the deep learning-based detection method achieved high overall accuracy, precision, recall and F1-score of 96.8% in the detection phase, indicating that the proposed technique could be used as a precise method for the detection of pecking activity levels in turkeys.

16 citations

Proceedings ArticleDOI
01 Jul 2020
TL;DR: This work aims to extend the conventional approach of epileptic seizure detection utilizing raw power spectra of EEG signals and convolutional neural networks (CNN) to compute the frequency characteristics of multi-channel EEG signals.
Abstract: The use of feature extraction and selection from EEG signals has shown to be useful in the detection of epileptic seizure segments. However, these traditional methods have more recently been surpassed by deep learning techniques, forgoing the need for complex feature engineering. This work aims to extend the conventional approach of epileptic seizure detection utilizing raw power spectra of EEG signals and convolutional neural networks (CNN). The proposed technique utilizes wavelet transform to compute the frequency characteristics of multi-channel EEG signals. The EEG signals are divided into 2 second epochs and frequency spectrum up to a cutoff frequency of 45 Hz is computed. This multi-channel raw spectral data forms the input to a one-dimensional CNN (1-D CNN). Spectral data from the current, previous, and next epochs is utilized for predicting the label of the current epoch. The performance of the technique is evaluated using a dataset of EEG signals from 24 cases. The proposed method achieves an accuracy of 97.25% in detecting epileptic seizure segments. This result shows that multi-channel EEG wavelet power spectra and 1-D CNN are useful in detecting epileptic seizures.

15 citations


Cites methods from "Acoustic event recognition using co..."

  • ...Originally developed as an image classification method [9], it has been successfully applied to audio signal classification tasks [10]....

    [...]

References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Journal ArticleDOI
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

37,861 citations

Book
Christopher M. Bishop1
17 Aug 2006
TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

22,840 citations

Proceedings Article
21 Jun 2010
TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.
Abstract: Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these "Stepped Sigmoid Units" are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors.

14,799 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations