Showing papers by "Goutam Saha published in 2008"

PDF

Open Access

Journal Article•

Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks

[...]

Sandipan Chakroborty, Anindya Roy, Goutam Saha

24 Nov 2008-World Academy of Science, Engineering and Technology, International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering

TL;DR: This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone when combined with MFCC via a parallel implementation of speaker models, and outperforms baseline MFCC significantly.

...read moreread less

Abstract: A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders. Keywords—Complementary Information, Filter Bank, GMM, IMFCC, MFCC, Speaker Identification, Speaker Recognition.

...read moreread less

103 citations

Journal Article•DOI•

A robust heart sound segmentation algorithm for commonly occurring heart valve diseases.

[...]

Samit Ari¹, Prashant Kumar¹, Goutam Saha¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

01 Jan 2008-Journal of Medical Engineering & Technology

TL;DR: This work extensively utilizes biomedical domain features for reduction of time and computational complexities and is more accurate than other approaches without auxiliary signal.

...read moreread less

Abstract: The first step towards detection of valvular heart diseases from heart sound signal (phonocardiogram) is segmentation. A segmentation algorithm provides the location of the first and second heart sounds which in turn helps to locate and analyse the murmur. Established phonocardiogram based segmentation methods use an electrocardiographic (ECG) signal as a continuous auxiliary input in a complex instrumentation setup. This paper proposes an automatic segmentation method that does not require any such auxiliary signal. Compared to other approaches without auxiliary signal, this work extensively utilizes biomedical domain features for reduction of time and computational complexities and is more accurate. The performance of the algorithm is evaluated for nine commonly occurring pathological cases and normal heart sound for various sampling frequencies, recording environments and age group of subjects. The proposed algorithm yields an overall accuracy of 97.47% and is compared with two competing techniques. In...

...read moreread less

45 citations

Journal Article•DOI•

Classification of heart sounds using empirical mode decomposition based features

[...]

Samit Ari¹, Goutam Saha¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

14 Jul 2008-International Journal of Medical Engineering and Informatics

TL;DR: A novel method to extract robust features for automatic classification of heart sounds based on Empirical Mode Decomposition (EMD) is presented and it is found that the EMD based feature extraction always performs better than benchmark waveletbased feature extraction technique.

...read moreread less

Abstract: A novel method is presented to extract robust features for automatic classification of heart sounds based on Empirical Mode Decomposition (EMD). The work decomposes segmented heart sound cycles with EMD to generate certain intrinsic mode functions (IMFs). It is seen that the first IMF contains mostly high frequency noise, the second and third IMFs carry higher frequency components of our signal of interest and residue contains its low frequency components. A twenty five dimensional feature vector is generated from average energy of the segmented IMFs and residue which serve as input to classifier models. Two different classifiers, Artificial Neural Network (ANN) and Grow and Learn (GAL) network, are used to show the performance of the proposed feature extraction technique. Experiments are conducted on 104 different recordings of heart sound comprising of normal and 12 different pathological cases against three different additive background noises – white Gaussian, hospital and body noise. It is found that the EMD based feature extraction always performs better than benchmark wavelet based feature extraction technique.

...read moreread less

33 citations

Journal Article•

In Search of an SVD and QRcp Based Optimization Technique of ANN for Automatic Classification of Abnormal Heart Sounds

[...]

Samit Ari, Goutam Saha

26 Jan 2008-World Academy of Science, Engineering and Technology, International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering

TL;DR: An optimization technique that involves Singular Value Decomposition (SVD) and QR factorization with column pivoting (QRcp) methodology to optimize empirically chosen over-parameterized ANN structure is presented.

...read moreread less

Abstract: Artificial Neural Network (ANN) has been extensively used for classification of heart sounds for its discriminative training ability and easy implementation. However, it suffers from overparameterization if the number of nodes is not chosen properly. In such cases, when the dataset has redundancy within it, ANN is trained along with this redundant information that results in poor validation. Also a larger network means more computational expense resulting more hardware and time related cost. Therefore, an optimum design of neural network is needed towards real-time detection of pathological patterns, if any from heart sound signal. The aims of this work are to (i) select a set of input features that are effective for identification of heart sound signals and (ii) make certain optimum selection of nodes in the hidden layer for a more effective ANN structure. Here, we present an optimization technique that involves Singular Value Decomposition (SVD) and QR factorization with column pivoting (QRcp) methodology to optimize empirically chosen over-parameterized ANN structure. Input nodes present in ANN structure is optimized by SVD followed by QRcp while only SVD is required to prune undesirable hidden nodes. The result is presented for classifying 12 common pathological cases and normal heart sound. Keywords—ANN, Classification of heart diseases, murmurs, optimization, Phonocardiogram, QRcp, SVD.

...read moreread less

10 citations

Journal Article•DOI•

Speech enhancement by joint statistical characterization in the Log Gabor Wavelet domain

[...]

Suman Senapati¹, Sandipan Chakroborty¹, Goutam Saha¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

01 Jun 2008-Speech Communication

TL;DR: A new method that uses the inter-scale dependency between the coefficients and their parents by a Circularly Symmetric Probability Density Function related to the family of Spherically Invariant Random Processes (SIRPs) in Log Gabor Wavelet (LGW) domain and corresponding joint shrinkage estimators are derived by Maximum a Posteriori (MAP) estimation theory is introduced.

...read moreread less

5 citations

Journal Article•

Application of a Novel Audio Compression Scheme in Automatic Music Recommendation, Digital Rights Management and Audio Fingerprinting

[...]

Anindya Roy, Goutam Saha

28 Dec 2008-World Academy of Science, Engineering and Technology, International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering

TL;DR: A recently proposed efficient audio compression scheme is used to develop three important applications in the context of Music Information Retrieval for the effective manipulation of large music databases, namely automatic music recommendation (AMR), digital rights management (DRM) and audio finger-printing for song identification.

...read moreread less

Abstract: Rapid progress in audio compression technology has contributed to the explosive growth of music available in digital form today. In a reversal of ideas, this work makes use of a recently proposed efficient audio compression scheme to develop three important applications in the context of Music Information Retrieval (MIR) for the effective manipulation of large music databases, namely automatic music recommendation (AMR), digital rights management (DRM) and audio finger-printing for song identification. The performance of these three applications has been evaluated with respect to a database of songs collected from a diverse set of genres. Keywords—Audio compression, Music Information Retrieval, Digital Rights Management, Audio Fingerprinting.

...read moreread less

3 citations

Journal Article•

Speech Enhancement by Marginal Statistical Characterization in the Log Gabor Wavelet Domain

[...]

Suman Senapati, Goutam Saha

27 Nov 2008-World Academy of Science, Engineering and Technology, International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering

TL;DR: This work presents a fusion of Log Gabor Wavelet and Maximum a Posteriori estimator as a speech enhancement tool for acoustical background noise reduction and shows a higher improvement in Segmental Signal-to-Noise Ratio (S-SNR) and lower Log-Spectral Distortion (LSD) in two different noisy environments compared to other estimators.

...read moreread less

Abstract: This work presents a fusion of Log Gabor Wavelet (LGW) and Maximum a Posteriori (MAP) estimator as a speech enhancement tool for acoustical background noise reduction. The probability density function (pdf) of the speech spectral amplitude is approximated by a Generalized Laplacian Distribution (GLD). Compared to earlier estimators the proposed method estimates the underlying statistical model more accurately by appropriately choosing the model parameters of GLD. Experimental results show that the proposed estimator yields a higher improvement in Segmental Signal-to-Noise Ratio (S-SNR) and lower Log-Spectral Distortion (LSD) in two different noisy environments compared to other estimators. Keywords—Speech Enhancement, Generalized Laplacian Distribution, Log Gabor Wavelet, Bayesian MAP Marginal Estimator.

...read moreread less

2 citations