scispace - formally typeset
Search or ask a question

Showing papers by "Goutam Saha published in 2009"


Proceedings ArticleDOI
01 Dec 2009
TL;DR: The proposed method improves the performance over baseline MFCC based SI system for various number of filters in the filterbank and includes the DCT in a distributed manner.
Abstract: Feature extraction is one of the most significant stage in development of a speaker identification (SI) system. Most of the SI systems use mel-frequency cepstral coefficient (MFCC) as a parameter for representing the speech signal into compact form. MFCC are extracted through spectral weighting by a bank of overlapping triangular filters followed by a de-correlation process. Conventionally, Discrete Cosine Transform (DCT-II) is used for de-correlation. In this paper, we propose the usage of a better de-correlation algorithm for MFCC. In traditional method DCT was applied coarsely to all the filterbank energies. In the proposed technique we have incorporated the DCT in a distributed manner. The experimental results on two publicly available database, each consisting more than 130 speakers show that the proposed method improves the performance over baseline MFCC based SI system for various number of filters in the filterbank. Index Terms—Feature Extraction, Speaker Identification, Dis- crete Cosine Transform, De-Correlation, Gaussian Mixture Model.

20 citations


Proceedings ArticleDOI
01 Dec 2009
TL;DR: Different distance measure techniques for selecting frames exploiting the redundancies between consecutive frames are proposed to reduce the number of frames for feature extraction but also to maintain the recognition accuracy reasonably high by selecting suitable frames containing speaker specific information.
Abstract: The total recognition time as well as the memory requirement in speaker recognition is mainly governed by the number of speakers, the number of frame vectors in the test sequence and the feature dimensionality. The adjacent frame vectors can show similarity in the feature space because of the slow movements of the articulators. Hence efficient frame selection techniques to select non-redundant frames in the preprocessing stage will be very effective in real time application of this recognition system. In pre-quantization (PQ) we select a new sequence of frames Y from the original frames X such that length of Y is less than X. In this paper we propose different distance measure techniques for selecting frames exploiting the redundancies between consecutive frames. The aim is not only to reduce the number of frames for feature extraction but also to maintain the recognition accuracy reasonably high by selecting suitable frames containing speaker specific information. The techniques are evaluated on two different telephone speech databases, POLYCOST and KING.

5 citations


Proceedings ArticleDOI
01 Nov 2009
TL;DR: A number of distance measure techniques for frame selection to exploit the redundancies of consecutive frames are analyzed and efficient techniques based on Probability Density Function are proposed that not only reduces the number of frames before feature extraction but also increases the recognition accuracy.
Abstract: The amount of speaker specific information in speech signal varies from frame to frame depending on spoken text and environmental conditions. A frame selection at the preprocessing stage can be an added advantage in this context. In pre-quantization (PQ) we select a new sequence of frames Y from the original frames X such that length of Y is less than X. In this paper, we first analyze a number of distance measure techniques for frame selection to exploit the redundancies of consecutive frames. Then we propose efficient techniques based on Probability Density Function (PDF) that not only reduces the number of frames before feature extraction but also increases the recognition accuracy. The proposed methods are evaluated on two different databases, POLYCOST (telephone speech) and YOHO (microphone speech), and is shown to provide significant improvement in performance for speaker recognition.

4 citations