scispace - formally typeset
Search or ask a question

Showing papers by "Goutam Saha published in 2008"


Journal Article
TL;DR: This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone when combined with MFCC via a parallel implementation of speaker models, and outperforms baseline MFCC significantly.
Abstract: A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders. Keywords—Complementary Information, Filter Bank, GMM, IMFCC, MFCC, Speaker Identification, Speaker Recognition.

103 citations


Journal ArticleDOI
TL;DR: This work extensively utilizes biomedical domain features for reduction of time and computational complexities and is more accurate than other approaches without auxiliary signal.
Abstract: The first step towards detection of valvular heart diseases from heart sound signal (phonocardiogram) is segmentation. A segmentation algorithm provides the location of the first and second heart sounds which in turn helps to locate and analyse the murmur. Established phonocardiogram based segmentation methods use an electrocardiographic (ECG) signal as a continuous auxiliary input in a complex instrumentation setup. This paper proposes an automatic segmentation method that does not require any such auxiliary signal. Compared to other approaches without auxiliary signal, this work extensively utilizes biomedical domain features for reduction of time and computational complexities and is more accurate. The performance of the algorithm is evaluated for nine commonly occurring pathological cases and normal heart sound for various sampling frequencies, recording environments and age group of subjects. The proposed algorithm yields an overall accuracy of 97.47% and is compared with two competing techniques. In...

45 citations


Journal ArticleDOI
TL;DR: A novel method to extract robust features for automatic classification of heart sounds based on Empirical Mode Decomposition (EMD) is presented and it is found that the EMD based feature extraction always performs better than benchmark waveletbased feature extraction technique.
Abstract: A novel method is presented to extract robust features for automatic classification of heart sounds based on Empirical Mode Decomposition (EMD). The work decomposes segmented heart sound cycles with EMD to generate certain intrinsic mode functions (IMFs). It is seen that the first IMF contains mostly high frequency noise, the second and third IMFs carry higher frequency components of our signal of interest and residue contains its low frequency components. A twenty five dimensional feature vector is generated from average energy of the segmented IMFs and residue which serve as input to classifier models. Two different classifiers, Artificial Neural Network (ANN) and Grow and Learn (GAL) network, are used to show the performance of the proposed feature extraction technique. Experiments are conducted on 104 different recordings of heart sound comprising of normal and 12 different pathological cases against three different additive background noises – white Gaussian, hospital and body noise. It is found that the EMD based feature extraction always performs better than benchmark wavelet based feature extraction technique.

33 citations


Journal Article
TL;DR: An optimization technique that involves Singular Value Decomposition (SVD) and QR factorization with column pivoting (QRcp) methodology to optimize empirically chosen over-parameterized ANN structure is presented.
Abstract: Artificial Neural Network (ANN) has been extensively used for classification of heart sounds for its discriminative training ability and easy implementation. However, it suffers from overparameterization if the number of nodes is not chosen properly. In such cases, when the dataset has redundancy within it, ANN is trained along with this redundant information that results in poor validation. Also a larger network means more computational expense resulting more hardware and time related cost. Therefore, an optimum design of neural network is needed towards real-time detection of pathological patterns, if any from heart sound signal. The aims of this work are to (i) select a set of input features that are effective for identification of heart sound signals and (ii) make certain optimum selection of nodes in the hidden layer for a more effective ANN structure. Here, we present an optimization technique that involves Singular Value Decomposition (SVD) and QR factorization with column pivoting (QRcp) methodology to optimize empirically chosen over-parameterized ANN structure. Input nodes present in ANN structure is optimized by SVD followed by QRcp while only SVD is required to prune undesirable hidden nodes. The result is presented for classifying 12 common pathological cases and normal heart sound. Keywords—ANN, Classification of heart diseases, murmurs, optimization, Phonocardiogram, QRcp, SVD.

10 citations


Journal ArticleDOI
TL;DR: A new method that uses the inter-scale dependency between the coefficients and their parents by a Circularly Symmetric Probability Density Function related to the family of Spherically Invariant Random Processes (SIRPs) in Log Gabor Wavelet (LGW) domain and corresponding joint shrinkage estimators are derived by Maximum a Posteriori (MAP) estimation theory is introduced.

5 citations


Journal Article
TL;DR: A recently proposed efficient audio compression scheme is used to develop three important applications in the context of Music Information Retrieval for the effective manipulation of large music databases, namely automatic music recommendation (AMR), digital rights management (DRM) and audio finger-printing for song identification.
Abstract: Rapid progress in audio compression technology has contributed to the explosive growth of music available in digital form today. In a reversal of ideas, this work makes use of a recently proposed efficient audio compression scheme to develop three important applications in the context of Music Information Retrieval (MIR) for the effective manipulation of large music databases, namely automatic music recommendation (AMR), digital rights management (DRM) and audio finger-printing for song identification. The performance of these three applications has been evaluated with respect to a database of songs collected from a diverse set of genres. Keywords—Audio compression, Music Information Retrieval, Digital Rights Management, Audio Fingerprinting.

3 citations


Journal Article
TL;DR: This work presents a fusion of Log Gabor Wavelet and Maximum a Posteriori estimator as a speech enhancement tool for acoustical background noise reduction and shows a higher improvement in Segmental Signal-to-Noise Ratio (S-SNR) and lower Log-Spectral Distortion (LSD) in two different noisy environments compared to other estimators.
Abstract: This work presents a fusion of Log Gabor Wavelet (LGW) and Maximum a Posteriori (MAP) estimator as a speech enhancement tool for acoustical background noise reduction. The probability density function (pdf) of the speech spectral amplitude is approximated by a Generalized Laplacian Distribution (GLD). Compared to earlier estimators the proposed method estimates the underlying statistical model more accurately by appropriately choosing the model parameters of GLD. Experimental results show that the proposed estimator yields a higher improvement in Segmental Signal-to-Noise Ratio (S-SNR) and lower Log-Spectral Distortion (LSD) in two different noisy environments compared to other estimators. Keywords—Speech Enhancement, Generalized Laplacian Distribution, Log Gabor Wavelet, Bayesian MAP Marginal Estimator.

2 citations