scispace - formally typeset
Proceedings ArticleDOI

Acoustic Scene Classification in Hearing aid using Deep Learning

Reads0
Chats0
TLDR
A simple sound classification system that could be used to automatically switch between various hearing aid algorithms based on the auditory related scene and accomplishes high precision with just three to five second duration per scene.
Abstract
Different audio environments require different settings in hearing aid to acquire high-quality speech. Manual tuning of hearing aid settings can be irritating. Thus, hearing aids can be provided with options and settings that can be tuned based on the audio environment. In this paper we provide a simple sound classification system that could be used to automatically switch between various hearing aid algorithms based on the auditory related scene. Features like MFCC, Mel-spectrogram, Chroma, Spectral contrast and Tonnetz are extracted from several hours of audio from five classes like “music,” “noise,” “speech with noise,” “silence,” and “clean speech” for training and testing the network. Using these features audio is processed by the convolution neural network. We show that our system accomplishes high precision with just three to five second duration per scene. The algorithm is efficient and consumes less memory footprint. It is possible to implement the system in digital hearing aid.

read more

Citations
More filters
Journal ArticleDOI

Deep Learning in Diverse Intelligent Sensor Based Systems

TL;DR: Deep learning has become a predominant method for solving data analysis problems in virtually all fields of science and engineering as mentioned in this paper , and the increasing complexity and large volume of data collected by diverse sensor systems have spurred the development of deep learning methods and have fundamentally transformed the way the data are acquired, processed, analyzed, and interpreted.
Book ChapterDOI

Late Fusion of Convolutional Neural Network with Wavelet-Based Ensemble Classifier for Acoustic Scene Classification

TL;DR: In this paper, the authors proposed a late fusion of multi-model using CNN and ensemble classifiers, which achieved an accuracy of 79.43% in the TUT 2017 Challenge-Task 1.
Journal ArticleDOI

New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding.

TL;DR: In this paper, a computer audition (i.e., intelligent audio) has made great strides in recent years; however, it is still far from achieving holistic hearing abilities, which more appropriately mimic human-like und...
Journal ArticleDOI

Determining Ratio of Prunable Channels in MobileNet by Sparsity for Acoustic Scene Classification

TL;DR: In this article , the authors proposed a method that determines the ratio of pruned channels by simple linear regression models related to the Sparsity of Channels (S C ) in the convolutional layers.

Exploring the Effects of Channel Sparsity on Neural Network Pruning for Acoustic Scene Classification

TL;DR: In this paper , the internal weights acquired by convolutional neural networks that will undergone pruning are used to create a novel metric, Weight Skewness (WS), for quantifying the sparsity of channels.
References
More filters
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI

CNN architectures for large-scale audio classification

TL;DR: In this paper, the authors used various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels.
Proceedings ArticleDOI

A Dataset and Taxonomy for Urban Sound Research

TL;DR: A taxonomy of urban sounds and a new dataset, UrbanSound, containing 27 hours of audio with 18.5 hours of annotated sound event occurrences across 10 sound classes are presented.
Proceedings ArticleDOI

Convolutional Neural Networks for Small-Footprint Keyword Spotting

TL;DR: This work explores using Convolutional Neural Networks for a small-footprint keyword spotting task and finds that the CNN architectures offer between a 27-44% relative improvement in false reject rate compared to a DNN, while fitting into the constraints of each application.
Related Papers (5)