Efficient text-independent speaker verification with structural Gaussian mixture models and neural network

doi:10.1109/TSA.2003.815822

Journal ArticleDOI

Efficient text-independent speaker verification with structural Gaussian mixture models and neural network

Bing Xiang, +1 more

- 26 Aug 2003 -

IEEE Transactions on Speech and Audio Pr...

- Vol. 11, Iss: 5, pp 447-456

TLDR

The experimental results show that computational reduction by a factor of 17 can be achieved with 5% relative reduction in equal error rate (EER) compared with the baseline, and the SGMM-SBM shows some advantages over the recently proposed hash GMM, including higher speed and better verification performance.

Abstract:

We present an integrated system with structural Gaussian mixture models (SGMMs) and a neural network for purposes of achieving both computational efficiency and high accuracy in text-independent speaker verification. A structural background model (SBM) is constructed first by hierarchically clustering all Gaussian mixture components in a universal background model (UBM). In this way the acoustic space is partitioned into multiple regions in different levels of resolution. For each target speaker, a SGMM can be generated through multilevel maximum a posteriori (MAP) adaptation from the SBM. During test, only a small subset of Gaussian mixture components are scored for each feature vector in order to reduce the computational cost significantly. Furthermore, the scores obtained in different layers of the tree-structured models are combined via a neural network for final decision. Different configurations are compared in the experiments conducted on the telephony speech data used in the NIST speaker verification evaluation. The experimental results show that computational reduction by a factor of 17 can be achieved with 5% relative reduction in equal error rate (EER) compared with the baseline. The SGMM-SBM also shows some advantages over the recently proposed hash GMM, including higher speed and better verification performance.

Efficient text-independent speaker verification with structural Gaussian mixture models and neural network

Citations

An overview of text-independent speaker recognition: From features to supervectors

Statistical Pattern Recognition

Real-time speaker identification and verification

Speaker Identification Using Instantaneous Frequencies

Speaker verification system

References

Maximum likelihood from incomplete data via the EM algorithm

Learning internal representations by error propagation

Learning internal representations by error propagation

Introduction to Statistical Pattern Recognition

Speaker Verification Using Adapted Gaussian Mixture Models

Related Papers (5)

Speaker Verification Using Adapted Gaussian Mixture Models

Robust text-independent speaker identification using Gaussian mixture speaker models

Speaker recognition: a tutorial

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains