Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks

Open AccessJournal Article

Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks

Sandipan Chakroborty, +2 more

- 24 Nov 2008 -

World Academy of Science, Engineering an...

- Vol. 2, Iss: 11, pp 2554-2561

TLDR

This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone when combined with MFCC via a parallel implementation of speaker models, and outperforms baseline MFCC significantly.

Abstract:

A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders. Keywords—Complementary Information, Filter Bank, GMM, IMFCC, MFCC, Speaker Identification, Speaker Recognition.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Constant Q cepstral coefficients

Massimiliano Todisco, +2 more

- 01 Sep 2017 -

Computer Speech & Language

TL;DR: An approach which combines speech signal analysis using the constant Q transform with traditional cepstral processing and results show that CQCC configuration is sensitive to the general form of spoofing attack and use case scenario suggests that the past single-system pursuit of generalised spoofing detection may need rethinking.

...read moreread less

Proceedings ArticleDOI

A Comparison of Features for Synthetic Speech Detection

Md. Sahidullah, +2 more

TL;DR: Comparative results indicate that features representing spectral information in high-frequency region, dynamic information of speech, and detailed information related to subband characteristics are considerably more useful in detecting synthetic speech detection task.

...read moreread less

Journal ArticleDOI

Lung sound classification using cepstral-based statistical features

Nandini Sengupta, +2 more

- 01 Aug 2016 -

Computers in Biology and Medicine

TL;DR: It is found that the newly investigated features are more robust than existing features and show better recognition accuracy even in low signal-to-noise ratios (SNRs).

...read moreread less

Proceedings ArticleDOI

Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge.

Roberto Font, +2 more

TL;DR: An experimental comparison of different features for the detection of replay spoofing attacks in Automatic Speaker Verification systems is presented and some general conclusions regarding feature extraction for replay attack detection are provided.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Fundamentals of speech recognition

Lawrence R. Rabiner, +1 more

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.

...read moreread less

Journal ArticleDOI

On combining classifiers

Josef Kittler, +3 more

- 01 Mar 1998 -

IEEE Transactions on Pattern Analysis an...

TL;DR: A common theoretical framework for combining classifiers which use distinct pattern representations is developed and it is shown that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision.

...read moreread less

Journal ArticleDOI

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

S. Davis, +1 more

- 01 Aug 1980 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.

...read moreread less

Journal ArticleDOI

Robust text-independent speaker identification using Gaussian mixture speaker models

Douglas A. Reynolds, +1 more

- 01 Jan 1995 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task.

...read moreread less

Book

Speech communication : human and machine

Douglas O'Shaughnessy

TL;DR: The toe or heel holder of a safety binding is pivotally mounted on a stub shaft, and held in its angular operating position by a spring-loaded, spherical detent guided in a bore of the holder radially relative to the shaft axis toward one of four equiangularly offset notches in the shaft which differ in their depth.

...read moreread less

Related Papers (5)

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

S. Davis, +1 more

- 01 Aug 1980 -

IEEE Transactions on Acoustics, Speech, ...

Robust text-independent speaker identification using Gaussian mixture speaker models

Douglas A. Reynolds, +1 more

- 01 Jan 1995 -

IEEE Transactions on Speech and Audio Pr...

Speaker recognition: a tutorial

Jr. J.P. Campbell

Combining evidence from residual phase and MFCC features for speaker recognition

K.S.R. Murty, +1 more

- 01 Jan 2006 -

IEEE Signal Processing Letters

An overview of text-independent speaker recognition: From features to supervectors

Tomi Kinnunen, +1 more

- 01 Jan 2010 -

Speech Communication

Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks

Citations

Constant Q cepstral coefficients

A Comparison of Features for Synthetic Speech Detection

Lung sound classification using cepstral-based statistical features

Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge.

Overview of BTAS 2016 speaker anti-spoofing competition

References

Fundamentals of speech recognition

On combining classifiers

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Robust text-independent speaker identification using Gaussian mixture speaker models

Speech communication : human and machine

Related Papers (5)

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Robust text-independent speaker identification using Gaussian mixture speaker models

Speaker recognition: a tutorial

Combining evidence from residual phase and MFCC features for speaker recognition

An overview of text-independent speaker recognition: From features to supervectors