scispace - formally typeset
Open AccessJournal Article

Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks

Sandipan Chakroborty, +2 more
- 24 Nov 2008 - 
- Vol. 2, Iss: 11, pp 2554-2561
TLDR
This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone when combined with MFCC via a parallel implementation of speaker models, and outperforms baseline MFCC significantly.
Abstract
A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders. Keywords—Complementary Information, Filter Bank, GMM, IMFCC, MFCC, Speaker Identification, Speaker Recognition.

read more

Citations
More filters
Journal ArticleDOI

Constant Q cepstral coefficients

TL;DR: An approach which combines speech signal analysis using the constant Q transform with traditional cepstral processing and results show that CQCC configuration is sensitive to the general form of spoofing attack and use case scenario suggests that the past single-system pursuit of generalised spoofing detection may need rethinking.
Proceedings ArticleDOI

A Comparison of Features for Synthetic Speech Detection

TL;DR: Comparative results indicate that features representing spectral information in high-frequency region, dynamic information of speech, and detailed information related to subband characteristics are considerably more useful in detecting synthetic speech detection task.
Journal ArticleDOI

Lung sound classification using cepstral-based statistical features

TL;DR: It is found that the newly investigated features are more robust than existing features and show better recognition accuracy even in low signal-to-noise ratios (SNRs).
Proceedings ArticleDOI

Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge.

TL;DR: An experimental comparison of different features for the detection of replay spoofing attacks in Automatic Speaker Verification systems is presented and some general conclusions regarding feature extraction for replay attack detection are provided.
References
More filters
Book

Fundamentals of speech recognition

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Journal ArticleDOI

On combining classifiers

TL;DR: A common theoretical framework for combining classifiers which use distinct pattern representations is developed and it is shown that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision.
Journal ArticleDOI

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.
Journal ArticleDOI

Robust text-independent speaker identification using Gaussian mixture speaker models

TL;DR: The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task.
Book

Speech communication : human and machine

TL;DR: The toe or heel holder of a safety binding is pivotally mounted on a stub shaft, and held in its angular operating position by a spring-loaded, spherical detent guided in a bore of the holder radially relative to the shaft axis toward one of four equiangularly offset notches in the shaft which differ in their depth.
Related Papers (5)