scispace - formally typeset
Search or ask a question
Author

K. M. M. Prabhu

Bio: K. M. M. Prabhu is an academic researcher from Indian Institute of Technology Madras. The author has contributed to research in topics: Sparse approximation & Audio signal processing. The author has an hindex of 2, co-authored 2 publications receiving 15 citations.

Papers
More filters
Proceedings ArticleDOI
25 Feb 2013
TL;DR: Algorithm to classify environmental sounds with the aim of providing contextual information to devices such as hearing aids for optimum performance using signal sub-band energy and Gaussian mixture model is described.
Abstract: In this paper we describe algorithms to classify environmental sounds with the aim of providing contextual information to devices such as hearing aids for optimum performance. We use signal sub-band energy to construct signal-dependent dictionary and matching pursuit algorithms to obtain a sparse representation of a signal. The coefficients of the sparse vector are used as weights to compute weighted features. These features, along with mel frequency cepstral coefficients (MFCC), are used as feature vectors for classification. Experimental results show that the proposed method gives an accuracy as high as 95.6 %, while classifying 14 categories of environmental sound using a gaussian mixture model (GMM).

14 citations

Proceedings ArticleDOI
14 Nov 2013
TL;DR: This work investigates if statistics obtained by decomposing sounds using a set of filter-banks and computing the moments of the filter responses, along with their correlation values can be used as features for classifying unvoiced sounds.
Abstract: Unvoiced phonemes have significant presence in spoken English language. These phonemes are hard to classify, due to their weak energy and lack of periodicity. Sound textures such as sound made by a flowing stream of water or falling droplets of rain have similar aperiodic properties in temporal domain as unvoiced phonemes. These sounds are easily differentiated by a human ear. Recent studies on sound texture analysis and synthesis have shown that the human auditory system perceives sound textures using simple statistics. These statistics are obtained by decomposing sounds using a set of filter-banks and computing the moments of the filter responses, along with their correlation values. In this work we investigate if the above mentioned statistics, which are easy to extract, can also be used as features for classifying unvoiced sounds. To incorporate the moments and correlation values as features, a framework containing multiple classifiers is proposed. Experiments conducted on the TIMIT dataset gave an accuracy on par with the latest reported in the literature, with lesser computational cost.

4 citations


Cited by
More filters
Proceedings ArticleDOI
01 Oct 2013
TL;DR: This survey will offer a qualitative and elucidatory survey on recent developments of environmental sound recognition, and includes three parts: i) basic environmental sound processing schemes, ii) stationary ESR techniques and iii) non-stationary E SR techniques.
Abstract: Although research in audio recognition has traditionally focused on speech and music signals, the problem of environmental sound recognition (ESR) has received more attention in recent years. Research on ESR has significantly increased in the past decade. Recent work has focused on the appraisal of non-stationary aspects of environmental sounds, and several new features predicated on non-stationary characteristics have been proposed. These features strive to maximize their information content pertaining to signal's temporal and spectral characteristics. Furthermore, sequential learning methods have been used to capture the long-term variation of environmental sounds. In this survey, we will offer a qualitative and elucidatory survey on recent developments. It includes three parts: i) basic environmental sound processing schemes, ii) stationary ESR techniques and iii) non-stationary ESR techniques. Finally, concluding remarks and future research and development trends in the ESR field will be given.

134 citations

Journal ArticleDOI
14 Dec 2014
TL;DR: This survey will offer a qualitative and elucidatory survey on recent developments of environmental sound recognition, and includes four parts: basic environmental sound-processing schemes, stationary E SR techniques, non-stationary ESR techniques, and performance comparison of selected methods.
Abstract: Although research in audio recognition has traditionally focused on speech and music signals, the problem of environmental sound recognition (ESR) has received more attention in recent years. Research on ESR has significantly increased in the past decade. Recent work has focused on the appraisal of non-stationary aspects of environmental sounds, and several new features predicated on non-stationary characteristics have been proposed. These features strive to maximize their information content pertaining to signal's temporal and spectral characteristics. Furthermore, sequential learning methods have been used to capture the long-term variation of environmental sounds. In this survey, we will offer a qualitative and elucidatory survey on recent developments. It includes four parts: (i) basic environmental sound-processing schemes, (ii) stationary ESR techniques, (iii) non-stationary ESR techniques, and (iv) performance comparison of selected methods. Finally, concluding remarks and future research and development trends in the ESR field will be given.

77 citations

Proceedings ArticleDOI
01 May 2014
TL;DR: An approach for location classification that does not need to have an explicit information about the place, in contrast with systems such as a Quick Response Code (QR) or Radio Frequency Identificator tag is presented.
Abstract: In this paper, we present an approach for location classification that does not need to have an explicit information about the place, in contrast with systems such as a Quick Response Code (QR) or Radio Frequency Identificator tag. Our approach consists of using an “audio fingerprint” of the environmental background sounds of the place. We propose a fingerprint that consists of a set of 62 audio features, which are from temporal, frequency and statistical features. To conform an audio fingerprint, a feature extraction process was performed. We apply the feature extraction process over 70 environmental sounds of 14 different places to conform each audio fingerprint. To demonstrate the effectiveness of the set of features, we evaluate the set of features with two different classifiers: Random Forest and Support Vector Machine. Our results indicate that using this set of features allow us to classify a place with an accuracy of 84.28% for Random Forest and 91.42% for Support Vector Machine.

12 citations

Proceedings ArticleDOI
06 Mar 2014
TL;DR: This paper deals with the prototype modeling for environmental sound recognition and shows a better efficiency than the already existing method.
Abstract: Environmental sound recognition is an audio scene identification process in which a person's location is found by analyzing the background sound. This paper deals with the prototype modeling for environmental sound recognition. Sound recognition involves the collection of audio data, extraction of important features, clustering of similar features and their classification. The Mel frequency cepstrum co-efficients are extracted. These features are used for clustering by a Gaussian mixture model which is a probabilistic model. Neural Network classifier is used for classification of the features and to identify the environmental audio scene. The implementation is done with the help of MATLAB. Five major environmental sounds which include the sound of car, office, restaurant, street, subway are considered. This shows a better efficiency than the already existing method. The efficiency achieved in this method is 98.9%.

9 citations

Proceedings ArticleDOI
19 Apr 2015
TL;DR: Experimental results show that exploiting tensor representation allows to characterize distinctive transient TF atoms, yielding an average accuracy improvement of 9.7% and 12.5% compared with matching pursuit (MP) and MFCC features.
Abstract: This paper describes a method to extract time-frequency (TF) audio features by tensor-based sparse approximation for sound effects classification In the proposed method, the observed data is encoded as a higher-order tensor and discriminative features are extracted in spectrotemporal domain Firstly, audio signals are represented by a joint time-frequency-duration tensor based on sparse approximation; then tensor factorization is applied to calculate feature vectors The three arrays of the proposed tensor are used to represent frequency, time and duration of transient TF atoms respectively Experimental results show that exploiting tensor representation allows to characterize distinctive transient TF atoms, yielding an average accuracy improvement of 97% and 125% compared with matching pursuit (MP) and MFCC features

7 citations