Front-End Factor Analysis for Speaker Verification

doi:10.1109/TASL.2010.2064307

Journal ArticleDOI

Front-End Factor Analysis for Speaker Verification

Najim Dehak, +4 more

- 01 May 2011 -

IEEE Transactions on Audio, Speech, and ...

- Vol. 19, Iss: 4, pp 788-798

Chats0

TLDR

An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.

Abstract:

This paper presents an extension of our previous work which proposes a new speaker representation for speaker verification. In this modeling, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis. This space is named the total variability space because it models both speaker and channel variabilities. Two speaker verification systems are proposed which use this new representation. The first system is a support vector machine-based system that uses the cosine kernel to estimate the similarity between the input data. The second system directly uses the cosine similarity as the final decision score. We tested three channel compensation techniques in the total variability space, which are within-class covariance normalization (WCCN), linear discriminate analysis (LDA), and nuisance attribute projection (NAP). We found that the best results are obtained when LDA is followed by WCCN. We achieved an equal error rate (EER) of 1.12% and MinDCF of 0.0094 using the cosine distance scoring on the male English trials of the core condition of the NIST 2008 Speaker Recognition Evaluation dataset. We also obtained 4% absolute EER improvement for both-gender trials on the 10 s-10 s condition compared to the classical joint factor analysis scoring.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

X-Vectors: Robust DNN Embeddings for Speaker Recognition

David Snyder, +4 more

TL;DR: This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.

...read moreread less

Proceedings ArticleDOI

VoxCeleb2: Deep Speaker Recognition.

Joon Son Chung, +2 more

TL;DR: In this article, a large-scale audio-visual speaker recognition dataset, VoxCeleb2, is presented, which contains over a million utterances from over 6,000 speakers.

...read moreread less

Proceedings Article

Analysis of i-vector Length Normalization in Speaker Recognition Systems.

Daniel Garcia-Romero, +1 more

TL;DR: The proposed approach deals with the nonGaussian behavior of i-vectors by performing a simple length normalization, which allows the use of probabilistic models with Gaussian assumptions that yield equivalent performance to that of more complicated systems based on Heavy-Tailed assumptions.

...read moreread less

Proceedings ArticleDOI

A time delay neural network architecture for efficient modeling of long temporal contexts.

Vijayaditya Peddinti, +2 more

TL;DR: This paper proposes a time delay neural network architecture which models long term temporal dependencies with training times comparable to standard feed-forward DNNs and uses sub-sampling to reduce computation during training.

...read moreread less

Proceedings ArticleDOI

Deep neural networks for small footprint text-dependent speaker verification

Ehsan Variani, +4 more

TL;DR: Experimental results show the DNN based speaker verification system achieves good performance compared to a popular i-vector system on a small footprint text-dependent speaker verification task and is more robust to additive noise and outperforms the i- vector system at low False Rejection operating points.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Speaker Verification Using Adapted Gaussian Mixture Models

Douglas A. Reynolds, +2 more

- 01 Jan 2000 -

Digital Signal Processing

TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.

...read moreread less

The Nature of Statistical Learning

Vladimir Vapnik

Journal ArticleDOI

Joint Factor Analysis Versus Eigenchannels in Speaker Recognition

Patrick Kenny, +3 more

- 01 May 2007 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: It is shown how the two approaches to the problem of session variability in Gaussian mixture model (GMM)-based speaker verification, eigenchannels, and joint factor analysis can be implemented using essentially the same software at all stages except for the enrollment of target speakers.

...read moreread less

Journal ArticleDOI

A Study of Interspeaker Variability in Speaker Verification

Patrick Kenny, +4 more

- 01 Jul 2008 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: It is shown that when a large joint factor analysis model is trained in this way and tested on the core condition, the extended data condition and the cross-channel condition, it is capable of performing at least as well as fusions of multiple systems of other types.

...read moreread less

Proceedings ArticleDOI

SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation

William M. Campbell, +3 more

TL;DR: A support vector machine kernel is constructed using the GMM supervector and similarities based on this kernel between the method of SVM nuisance attribute projection (NAP) and the recent results in latent factor analysis are shown.

...read moreread less

Related Papers (5)

Speaker Verification Using Adapted Gaussian Mixture Models

Douglas A. Reynolds, +2 more

- 01 Jan 2000 -

Digital Signal Processing

Speech Communication

Front-End Factor Analysis for Speaker Verification

Citations

X-Vectors: Robust DNN Embeddings for Speaker Recognition

VoxCeleb2: Deep Speaker Recognition.

Analysis of i-vector Length Normalization in Speaker Recognition Systems.

A time delay neural network architecture for efficient modeling of long temporal contexts.

Deep neural networks for small footprint text-dependent speaker verification

References

Speaker Verification Using Adapted Gaussian Mixture Models

The Nature of Statistical Learning

Joint Factor Analysis Versus Eigenchannels in Speaker Recognition

A Study of Interspeaker Variability in Speaker Verification

SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation

Related Papers (5)

Speaker Verification Using Adapted Gaussian Mixture Models

Probabilistic Linear Discriminant Analysis for Inferences About Identity

X-Vectors: Robust DNN Embeddings for Speaker Recognition

The Kaldi Speech Recognition Toolkit

An overview of text-independent speaker recognition: From features to supervectors