scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Support vector machines using GMM supervectors for speaker verification

TL;DR: This work examines the idea of using the GMM supervector in a support vector machine (SVM) classifier and proposes two new SVM kernels based on distance metrics between GMM models that produce excellent classification accuracy in a NIST speaker recognition evaluation task.
Abstract: Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker recognition. The standard training method for GMM models is to use MAP adaptation of the means of the mixture components based on speech from a target speaker. Recent methods in compensation for speaker and channel variability have proposed the idea of stacking the means of the GMM model to form a GMM mean supervector. We examine the idea of using the GMM supervector in a support vector machine (SVM) classifier. We propose two new SVM kernels based on distance metrics between GMM models. We show that these SVM kernels produce excellent classification accuracy in a NIST speaker recognition evaluation task.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper starts with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling and elaborate advanced computational techniques to address robustness and session variability.

1,433 citations


Cites background or methods from "Support vector machines using GMM s..."

  • ...Given the demonstrated excellent performance of the JFA compensation and Gaussian supervector SVMs [38], it seems appropriate to ask how they compare with each other, and whether they could be combined?...

    [...]

  • ...With SVMs, normalizing the dynamic ranges of the supervector elements is also crucial since SVMs are not scale invariant [232]....

    [...]

  • ...In [38] the authors derive the Gaussian supervector (GSV) kernel by bounding the Kullback-Leibler (KL) divergence measure between GMMs....

    [...]

  • ...Currently SVM is one of the most robust classifiers in speaker verification, and it has also been successfully combined with GMM to increase accuracy [36, 38]....

    [...]

  • ...Since the universal background model (UBM) is included as a part in most speaker recognition systems, it provides a natural way to create supervectors [38, 52, 132]....

    [...]

Proceedings ArticleDOI
04 May 2014
TL;DR: A novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for automatic speech recognition (ASR) to produce frame alignments.
Abstract: We propose a novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for automatic speech recognition (ASR) Specifically, the DNN replaces the standard Gaussian mixture model (GMM) to produce frame alignments The use of an ASR-DNN system in the speaker recognition pipeline is attractive as it integrates the information from speech content directly into the statistics, allowing the standard backends to remain unchanged Improvement from the proposed framework compared to a state-of-the-art system are of 30% relative at the equal error rate when evaluated on the telephone conditions from the 2012 NIST speaker recognition evaluation (SRE) The proposed framework is a successful way to efficiently leverage transcribed data for speaker recognition, thus opening up a wide spectrum of research directions

631 citations

Proceedings ArticleDOI
14 May 2006
TL;DR: A support vector machine kernel is constructed using the GMM supervector and similarities based on this kernel between the method of SVM nuisance attribute projection (NAP) and the recent results in latent factor analysis are shown.
Abstract: Gaussian mixture models with universal backgrounds (UBMs) have become the standard method for speaker recognition. Typically, a speaker model is constructed by MAP adaptation of the means of the UBM. A GMM supervector is constructed by stacking the means of the adapted mixture components. A recent discovery is that latent factor analysis of this GMM supervector is an effective method for variability compensation. We consider this GMM supervector in the context of support vector machines. We construct a support vector machine kernel using the GMM supervector. We show similarities based on this kernel between the method of SVM nuisance attribute projection (NAP) and the recent results in latent factor analysis. Experiments on a NIST SRE 2005 corpus demonstrate the effectiveness of the new technique.

625 citations


Cites methods from "Support vector machines using GMM s..."

  • ...Second, for a nonlinear kernel [10], SVM NAP uses a nonlinear expanded version of the GMM supervector....

    [...]

Journal ArticleDOI
TL;DR: A comparative study of human versus machine speaker recognition is concluded, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems.
Abstract: Identifying a person by his or her voice is an important human trait most take for granted in natural human-to-human interaction/communication. Speaking to someone over the telephone usually begins by identifying who is speaking and, at least in cases of familiar speakers, a subjective verification by the listener that the identity is correct and the conversation can proceed. Automatic speaker-recognition systems have emerged as an important means of verifying identity in many e-commerce applications as well as in general business interactions, forensics, and law enforcement. Human experts trained in forensic speaker recognition can perform this task even better by examining a set of acoustic, prosodic, and linguistic characteristics of speech in a general approach referred to as structured listening. Techniques in forensic speaker recognition have been developed for many years by forensic speech scientists and linguists to help reduce any potential bias or preconceived understanding as to the validity of an unknown audio sample and a reference template from a potential suspect. Experienced researchers in signal processing and machine learning continue to develop automatic algorithms to effectively perform speaker recognition?with ever-improving performance?to the point where automatic systems start to perform on par with human listeners. In this article, we review the literature on speaker recognition by machines and humans, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems. We discuss different aspects of automatic systems, including voice-activity detection (VAD), features, speaker models, standard evaluation data sets, and performance metrics. Human speaker recognition is discussed in two parts?the first part involves forensic speaker-recognition methods, and the second illustrates how a na?ve listener performs this task from a neuroscience perspective. We conclude this review with a comparative study of human versus machine speaker recognition and attempt to point out strengths and weaknesses of each.

554 citations

Proceedings ArticleDOI
26 Sep 2010
TL;DR: It is shown how speakers and participants' emotions can be automatically detected by means of classifiers running locally on off-the-shelf mobile phones, and how speaking and interactions can be correlated with activity and location measures.
Abstract: Today's mobile phones represent a rich and powerful computing platform, given their sensing, processing and communication capabilities. Phones are also part of the everyday life of billions of people, and therefore represent an exceptionally suitable tool for conducting social and psychological experiments in an unobtrusive way.de the ability of sensing individual emotions as well as activities, verbal and proximity interactions among members of social groups. Moreover, the system is programmable by means of a declarative language that can be used to express adaptive rules to improve power saving. We evaluate a system prototype on Nokia Symbian phones by means of several small-scale experiments aimed at testing performance in terms of accuracy and power consumption. Finally, we present the results of real deployment where we study participants emotions and interactions. We cross-validate our measurements with the results obtained through questionnaires filled by the users, and the results presented in social psychological studies using traditional methods. In particular, we show how speakers and participants' emotions can be automatically detected by means of classifiers running locally on off-the-shelf mobile phones, and how speaking and interactions can be correlated with activity and location measures.

504 citations


Cites methods from "Support vector machines using GMM s..."

  • ...1Alternative SVM-based schemes, including the popular GMMsupervector [7] and MLLR [28] kernel classifiers, were not considered as they are generally suitable for binary classification tasks....

    [...]

  • ...Alternative SVM-based schemes, including the popular GMMsupervector [7] and MLLR [28] kernel classifiers, were not considered as they are generally suitable for binary classification tasks....

    [...]

References
More filters
Book
01 Jan 1973

14,545 citations

Journal ArticleDOI
TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.

4,673 citations


"Support vector machines using GMM s..." refers methods in this paper

  • ...For GMM MAP training, we adapt only the means with a relevance factor of 16 [1]....

    [...]

  • ...The standard approach to this problem is to model the speaker using an adapted Gaussian mixture model (GMM) [1]....

    [...]

  • ...Given a speaker utterance, GMM UBM training is performed by MAP adaptation [1] of the means ....

    [...]

Book
12 Aug 2008
TL;DR: This book explains the principles that make support vector machines (SVMs) a successful modelling and prediction tool for a variety of applications and provides a unique in-depth treatment of both fundamental and recent material on SVMs that so far has been scattered in the literature.
Abstract: This book explains the principles that make support vector machines (SVMs) a successful modelling and prediction tool for a variety of applications. The authors present the basic ideas of SVMs together with the latest developments and current research questions in a unified style. They identify three reasons for the success of SVMs: their ability to learn well with only a very small number of free parameters, their robustness against several types of model violations and outliers, and their computational efficiency compared to several other methods. Since their appearance in the early nineties, support vector machines and related kernel-based methods have been successfully applied in diverse fields of application such as bioinformatics, fraud detection, construction of insurance tariffs, direct marketing, and data and text mining. As a consequence, SVMs now play an important role in statistical machine learning and are used not only by statisticians, mathematicians, and computer scientists, but also by engineers and data analysts. The book provides a unique in-depth treatment of both fundamental and recent material on SVMs that so far has been scattered in the literature. The book can thus serve as both a basis for graduate courses and an introduction for statisticians, mathematicians, and computer scientists. It further provides a valuable reference for researchers working in the field. The book covers all important topics concerning support vector machines such as: loss functions and their role in the learning process; reproducing kernel Hilbert spaces and their properties; a thorough statistical analysis that uses both traditional uniform bounds and more advanced localized techniques based on Rademacher averages and Talagrand's inequality; a detailed treatment of classification and regression; a detailed robustness analysis; and a description of some of the most recent implementation techniques. To make the book self-contained, an extensive appendix is added which provides the reader with the necessary background from statistics, probability theory, functional analysis, convex analysis, and topology.

4,664 citations


"Support vector machines using GMM s..." refers background or methods in this paper

  • ...SUPPORT VECTOR MACHINES An SVM [5] is a two-class classifier constructed from sums of a kernel function K(·, ·),...

    [...]

  • ...Note that since it is linear, it satisfies the Mercer condition [5]....

    [...]

  • ...Since each of the terms in the sum in (11) is a kernel, and the sum of kernels is also a kernel, then (11) is also a kernel, see [5]....

    [...]

01 Jan 2000
TL;DR: In this paper, learning reference EPFL-REPORT-82604 is used to learn Reference EPFL this paper. But learning reference is not considered in this paper. http://publications.idiap.ch/downloads/reports/2000/rr00-17.pdf Record created on 2006-03-10, modified on 2017-05-10
Abstract: Keywords: learning Reference EPFL-REPORT-82604 URL: http://publications.idiap.ch/downloads/reports/2000/rr00-17.pdf Record created on 2006-03-10, modified on 2017-05-10

904 citations

Journal ArticleDOI
TL;DR: Keywords: learning Reference EPFL-REPORT-82604 URL: http://publications.idiap.ch/downloads/reports/2000/rr00-17.pdf
Abstract: Support Vector Machines (SVMs) for regression problems are trained by solving a quadratic optimization problem which needs on the order of l square memory and time resources to solve, where l is the number of training examples. In this paper, we propose a decomposition algorithm, SVMTorch (available at http://www.idiap.ch/learning/SVMTorch.html ), which is similar to SVM-Light proposed by Joachims (1999) for classification problems, but adapted to regression problems. With this algorithm, one can now efficiently solve large-scale regression problems (more than 20000 examples). Comparisons with Nodelib, another publicly available SVM algorithm for large-scale regression problems from Flake and Lawrence (2000) yielded significant time improvements. Finally, based on a recent paper from Lin (2000), we show that a convergence proof exists for our algorithm.

829 citations


"Support vector machines using GMM s..." refers methods in this paper

  • ...Both kernels in (8) and (12) were implemented using SVMTorch as an SVM trainer [7]....

    [...]

  • ...The vectors are support vectors and obtained from the training set by an optimization process [7]....

    [...]