scispace - formally typeset
Search or ask a question

Showing papers by "Goutam Saha published in 2012"


Proceedings ArticleDOI
01 Dec 2012
TL;DR: Speech-signal-based frequency cepstral coefficients (SFCC) is introduced in speaker recognition domain and proposed to use combination of filter banks of both the MFCC and SFCC in text-independent speaker identification.
Abstract: Over the decade, mel-frequency cepstral coefficient (MFCC) has been the most popular feature extraction method in the field of automatic speaker recognition. But in case of robust speaker recognition system, its performance is good for white noise contamination but not as good for other noises. We introduce speech-signal-based frequency cepstral coefficients (SFCC) in speaker recognition domain. In this method, frequency warping function is derived directly from the speech signal itself by considering equal area portions of the logarithm of the ensemble average short-time power spectrum of entire speech corpus. Speech-signal-based frequency warping function is very much similar to the frequency scale obtained through psycho-acoustic experiments known as mel scale and bark scale. We have proposed to use combination of filter banks of both the MFCC and SFCC in text-independent speaker identification. Speaker identification experiments are performed on POLY-COST database. The proposed technique gives better performance than the single streamed MFCC or SFCC based features for robust speaker identification system.

18 citations


Proceedings ArticleDOI
01 Dec 2012
TL;DR: A method based on complexity measuring theorem that can give reliable diagnosis of LS in an automated environment is proposed and could be very useful in developing assisting device for medical professionals.
Abstract: Lung sound (LS) contains information regarding the lungs status. Medical practitioners listen to these sounds using stethoscope and make interpretation. This procedure is known as auscultation which totally depends on the physicians experience and knowledge. There is a probability of misinterpretation due to human factor involved. In this paper, we propose a method based on complexity measuring theorem that can give reliable diagnosis of LS in an automated environment. The developed algorithm detects the lung conditions by calculating the sample entropy value of the frequency spectrum. The results are evaluated through statistical analysis and corroborated by a pulmonologist. The technique could be very useful in developing assisting device for medical professionals.

3 citations


Proceedings ArticleDOI
01 Dec 2012
TL;DR: A system which takes speech file as input and identifies the language family to which it belongs is discussed and this system is used to find out the influence of Dravidian family on Indo-European family.
Abstract: India is a vast country with a large number of languages. Among these some languages descend from a single mother language giving rise to a language family. The major official languages in India fall under two language families namely Indo-European and Dravidian. In this paper, we have discussed about a system which takes speech file as input and identifies the language family to which it belongs. We also used this system to find out the influence of Dravidian family on Indo-European family. The system uses a combination of Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Coefficients (SDC) as language specific features. Presently, SDC is the most popular feature for language identification. It captures temporal information of speech over a broad range of time. Gaussian Mixture Model based approach is used to effectively model the language families where the distribution of feature vector of a class is approximated using sum of Gaussians. The results give interesting insights of certain Indian languages and applicability of machine learning process in this domain.

3 citations


Proceedings ArticleDOI
03 Apr 2012
TL;DR: It has been shown that the loss due to GSM-AMR codec is very significant for speaker verification compared to undecoded speech, though the packet loss and bit rate may degrade the quality of speech but it is not significant to detection of speaker's identity.
Abstract: Automatic Speaker Verification (ASV) is a challenging task over the mobile/IP based system as the coding introduces some loss in system performance This paper reports on the work in progress to examine the impact of GSM-AMR codec used in mobile at its various bit rates and G729 codec for VoIP, along with different kind of noise and packet loss scenario for the speech signal PURE YOHO database has been used for the evaluation of this task Respective encoder and decoders are used back to back on wideband clean microphone speech to simulate the real-life situation Evaluation of performance is done through the measurement of Equal Error Rate (EER) It has been shown that the loss due to GSM-AMR codec is very significant for speaker verification compared to undecoded speech Though the packet loss and bit rate may degrade the quality of speech but it is not significant to detection of speaker's identity

3 citations


Proceedings ArticleDOI
03 Apr 2012
TL;DR: A modification to the baseline BIC segmentation scheme is proposed, which makes use of local search information to reduce the overall complexity of the segmentation procedure.
Abstract: Segmentation is typically the most computationally expensive step involved in majority of speaker diarization systems. Bayesian Information Criterion (BIC) is a very widely adopted method for segmentation of audio data. While BIC returns fairly good results in terms of segmentation performance, it suffers from the problem of enormous complexity. Moreover, BIC based diarization systems encounter the worst case complexity when there is no change point in the input audio stream at all. Many audio streams contain fairly large segments separated by a very few change points. In such cases, it becomes impractical to employ BIC segmentation because of its complexity. In this paper, we have proposed a modification to the baseline BIC segmentation scheme, which makes use of local search information to reduce the overall complexity of the segmentation procedure. The results have been tested on several audio streams from broadcast news and the diarization runtime has been found to get reduced by a factor of 3.45, with a marginally better segmentation performance.

Proceedings ArticleDOI
03 Apr 2012
TL;DR: A pattern recognition system is proposed to study the neural activity profile for male Evans rat performing delayed non-match to sample (DNMS) task, takes into account the practical constraints during acquisition of the neural signals.
Abstract: Hippocampus, a region of brain is considered to be essential for memory and navigation. This is a feature which is functionally and structurally quite similar among the mammals. As invasive studies in human are not advised and that in laboratory animals give an equivalent result this region was the subject of much research in the past three decades. Most of the researches were focused on the structural aspects of the region. This paper proposes a pattern recognition system to study the neural activity profile for male Evans rat performing delayed non-match to sample (DNMS) task. This system, takes into account the practical constraints during acquisition of the neural signals. The best result found in this study shows 5.2%, 5.3%, 5.5% prediction error for 1 sec, 2 sec and 3 sec ahead prediction respectively.

Proceedings ArticleDOI
03 Apr 2012
TL;DR: A distance based method to mitigate the effects of outliers and incorporate fusion techniques to improve the recognition accuracy of speaker recognition system and simultaneously improve the detection rate of outlier with respect to the base line feature set.
Abstract: Outliers in real time speaker recognition can be viewed as a disturbing element and one of the reason of the degradation of the recognition accuracy. In speaker space, outliers may consider as non-intrinsic speaker's information in clean environment or noise information in noisy environment. So detection of outliers purify the speaker space with most speaker specific feature vectors in both clean and noisy environment. There are several methodology to detect outliers but in this paper we use a distance based method to mitigate the effects of outliers and incorporate fusion techniques to improve the recognition accuracy of speaker recognition system. Distances are taken from Minkowski family up to third order and also Mahalanobis distance which is a probabilistic distance. In fusion methodology we use GMM as a single classifier with complementary feature sets, MFCC and IMFCC. In this paper, we fuse the score of MFCC and IMFCC with a equal weight method. This method not only improves the recognition accuracy but simultaneously improve the detection rate of outliers with respect to the base line feature set.