Other affiliations: Indian Institutes of Technology
Bio: Goutam Saha is an academic researcher from Indian Institute of Technology Kharagpur. The author has contributed to research in topic(s): Speaker recognition & Mel-frequency cepstrum. The author has an hindex of 24, co-authored 73 publication(s) receiving 1996 citation(s). Previous affiliations of Goutam Saha include Indian Institutes of Technology.
Papers published on a yearly basis
01 May 2012-Speech Communication
TL;DR: A class of linear transformation techniques based on block wise transformation of MFLE which effectively decorrelate the filter bank log energies and also capture speech information in an efficient manner are studied.
Abstract: Standard Mel frequency cepstrum coefficient (MFCC) computation technique utilizes discrete cosine transform (DCT) for decorrelating log energies of filter bank output. The use of DCT is reasonable here as the covariance matrix of Mel filter bank log energy (MFLE) can be compared with that of highly correlated Markov-I process. This full-band based MFCC computation technique where each of the filter bank output has contribution to all coefficients, has two main disadvantages. First, the covariance matrix of the log energies does not exactly follow Markov-I property. Second, full-band based MFCC feature gets severely degraded when speech signal is corrupted with narrow-band channel noise, though few filter bank outputs may remain unaffected. In this work, we have studied a class of linear transformation techniques based on block wise transformation of MFLE which effectively decorrelate the filter bank log energies and also capture speech information in an efficient manner. A thorough study has been carried out on the block based transformation approach by investigating a new partitioning technique that highlights associated advantages. This article also reports a novel feature extraction scheme which captures complementary information to wide band information; that otherwise remains undetected by standard MFCC and proposed block transform (BT) techniques. The proposed features are evaluated on NIST SRE databases using Gaussian mixture model-universal background model (GMM-UBM) based speaker recognition system. We have obtained significant performance improvement over baseline features for both matched and mismatched condition, also for standard and narrow-band noises. The proposed method achieves significant performance improvement in presence of narrow-band noise when clubbed with missing feature theory based score computation scheme.
TL;DR: The extraction of fetal electrocardiogram (ECG) from the composite maternal ECG signal obtained from the abdominal lead is discussed, and the proposed method employs singular value decomposition (SVD) and analysis based on the singular value ratio (SVR) spectrum.
Abstract: The extraction of fetal electrocardiogram (ECG) from the composite maternal ECG signal obtained from the abdominal lead is discussed. The proposed method employs singular value decomposition (SVD) and analysis based on the singular value ratio (SVR) spectrum. The maternal ECG (M-ECG) and the fetal ECG (F-ECG) components are identified in terms of the SV-decomposed modes of the appropriately configured data matrices, and elimination of the M-ECG and determination of F-ECG are achieved through selective separation of the SV-decomposed components. The unique feature of the method is that only one composite maternal ECG signal is required to determine the P-ECG component. The method is numerically robust and computationally efficient.
TL;DR: A technique to improve the performance of the Least Square Support Vector Machine (LSSVM) is proposed for classification of normal and abnormal heart sounds using wavelet based feature set using Lagrange multiplier and weight vector.
Abstract: Auscultation, the technique of listening to heart sounds with a stethoscope can be used as a primary detection system for diagnosing heart valve disorders. Phonocardiogram, the digital recording of heart sounds is becoming increasingly popular as it is relatively inexpensive. In this paper, a technique to improve the performance of the Least Square Support Vector Machine (LSSVM) is proposed for classification of normal and abnormal heart sounds using wavelet based feature set. In the proposed technique, the Lagrange multiplier is modified based on Least Mean Square (LMS) algorithm, which in turn modifies the weight vector to reduce the classification error. The basic idea is to enlarge the separating boundary surface, such that the separability between the clusters is increased. The updated weight vector is used at the time of testing. The performance of the proposed systems is evaluated on 64 different recordings of heart sounds comprising of normal and five different pathological cases. It is found that the proposed technique classifies the heart sounds with higher recognition accuracy than competing techniques.
TL;DR: It is found that the newly investigated features are more robust than existing features and show better recognition accuracy even in low signal-to-noise ratios (SNRs).
Abstract: Lung sounds convey useful information related to pulmonary pathology. In this paper, short-term spectral characteristics of lung sounds are studied to characterize the lung sounds for the identification of associated diseases. Motivated by the success of cepstral features in speech signal classification, we evaluate five different cepstral features to recognize three types of lung sounds: normal, wheeze and crackle. Subsequently for fast and efficient classification, we propose a new feature set computed from the statistical properties of cepstral coefficients. Experiments are conducted on a dataset of 30 subjects using the artificial neural network (ANN) as a classifier. Results show that the statistical features extracted from mel-frequency cepstral coefficients (MFCCs) of lung sounds outperform commonly used wavelet-based features as well as standard cepstral coefficients including MFCCs. Further, we experimentally optimize different control parameters of the proposed feature extraction algorithm. Finally, we evaluate the features for noisy lung sound recognition. We have found that our newly investigated features are more robust than existing features and show better recognition accuracy even in low signal-to-noise ratios (SNRs). HighlightsA new feature for computer-based lung sound classification is proposed.Proposed features utilize statistical properties of conventional cepstral features.Proposed features outperform wavelet-based features.The computational time is reduced as compared to baseline cepstral features.
24 Nov 2008-World Academy of Science, Engineering and Technology, International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering
TL;DR: This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone when combined with MFCC via a parallel implementation of speaker models, and outperforms baseline MFCC significantly.
Abstract: A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders. Keywords—Complementary Information, Filter Bank, GMM, IMFCC, MFCC, Speaker Identification, Speaker Recognition.
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.
01 May 1981
TL;DR: This chapter discusses Detecting Influential Observations and Outliers, a method for assessing Collinearity, and its applications in medicine and science.
Abstract: 1. Introduction and Overview. 2. Detecting Influential Observations and Outliers. 3. Detecting and Assessing Collinearity. 4. Applications and Remedies. 5. Research Issues and Directions for Extensions. Bibliography. Author Index. Subject Index.
01 Jan 1985-Electronics and Power
TL;DR: In this paper, the performance of wavelet decomposition-based de-noising and wavelet filter based denoising methods are compared based on signals from mechanical defects, and the comparison result reveals that wavelet filters are more suitable and reliable to detect a weak signature of mechanical impulse-like defect signals, whereas the wavelet transform has a better performance on smooth signal detection.
Abstract: De-noising and extraction of the weak signature are crucial to fault prognostics in which case features are often very weak and masked by noise. The wavelet transform has been widely used in signal de-noising due to its extraordinary time-frequency representation capability. In this paper, the performance of wavelet decomposition-based de-noising and wavelet filter-based de-noising methods are compared based on signals from mechanical defects. The comparison result reveals that wavelet filter is more suitable and reliable to detect a weak signature of mechanical impulse-like defect signals, whereas the wavelet decomposition de-noising method can achieve satisfactory results on smooth signal detection. In order to select optimal parameters for the wavelet filter, a two-step optimization process is proposed. Minimal Shannon entropy is used to optimize the Morlet wavelet shape factor. A periodicity detection method based on singular value decomposition (SVD) is used to choose the appropriate scale for the wavelet transform. The signal de-noising results from both simulated signals and experimental data are presented and both support the proposed method.
28 Jun 2007-Biological Rhythm Research
TL;DR: Various procedures used in the analysis of circadian rhythms at the populational, organismal, cellular and molecular levels are reviewed.
Abstract: This article reviews various procedures used in the analysis of circadian rhythms at the populational, organismal, cellular and molecular levels. The procedures range from visual inspection of time plots and actograms to several mathematical methods of time series analysis. Computational steps are described in some detail, and additional bibliographic resources and computer programs are listed.