Support vector machines using GMM supervectors for speaker verification

doi:10.1109/LSP.2006.870086

Home
/
Papers
/
Support vector machines using GMM supervectors for speaker verification

Journal Article•DOI•

Support vector machines using GMM supervectors for speaker verification

William M. Campbell¹, Douglas E. Sturim¹, Douglas A. Reynolds¹•Institutions (1)

Massachusetts Institute of Technology¹

10 Apr 2006-IEEE Signal Processing Letters (IEEE)-Vol. 13, Iss: 5, pp 308-311

TL;DR: This work examines the idea of using the GMM supervector in a support vector machine (SVM) classifier and proposes two new SVM kernels based on distance metrics between GMM models that produce excellent classification accuracy in a NIST speaker recognition evaluation task.

read less

Abstract: Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker recognition. The standard training method for GMM models is to use MAP adaptation of the means of the mixture components based on speech from a target speaker. Recent methods in compensation for speaker and channel variability have proposed the idea of stacking the means of the GMM model to form a GMM mean supervector. We examine the idea of using the GMM supervector in a support vector machine (SVM) classifier. We propose two new SVM kernels based on distance metrics between GMM models. We show that these SVM kernels produce excellent classification accuracy in a NIST speaker recognition evaluation task.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

An overview of text-independent speaker recognition: From features to supervectors

[...]

Tomi Kinnunen¹, Haizhou Li²•Institutions (2)

University of Eastern Finland¹, Institute for Infocomm Research Singapore²

01 Jan 2010-Speech Communication

TL;DR: This paper starts with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling and elaborate advanced computational techniques to address robustness and session variability.

...read moreread less

1,433 citations

Cites background or methods from "Support vector machines using GMM s..."

...Given the demonstrated excellent performance of the JFA compensation and Gaussian supervector SVMs [38], it seems appropriate to ask how they compare with each other, and whether they could be combined?...
[...]
...With SVMs, normalizing the dynamic ranges of the supervector elements is also crucial since SVMs are not scale invariant [232]....
[...]
...In [38] the authors derive the Gaussian supervector (GSV) kernel by bounding the Kullback-Leibler (KL) divergence measure between GMMs....
[...]
...Currently SVM is one of the most robust classifiers in speaker verification, and it has also been successfully combined with GMM to increase accuracy [36, 38]....
[...]
...Since the universal background model (UBM) is included as a part in most speaker recognition systems, it provides a natural way to create supervectors [38, 52, 132]....
[...]

Proceedings Article•DOI•

A novel scheme for speaker recognition using a phonetically-aware deep neural network

[...]

Yun Lei, Nicolas Scheffer, Luciana Ferrer, Mitchell McLaren

04 May 2014

TL;DR: A novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for automatic speech recognition (ASR) to produce frame alignments.

...read moreread less

Abstract: We propose a novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for automatic speech recognition (ASR) Specifically, the DNN replaces the standard Gaussian mixture model (GMM) to produce frame alignments The use of an ASR-DNN system in the speaker recognition pipeline is attractive as it integrates the information from speech content directly into the statistics, allowing the standard backends to remain unchanged Improvement from the proposed framework compared to a state-of-the-art system are of 30% relative at the equal error rate when evaluated on the telephone conditions from the 2012 NIST speaker recognition evaluation (SRE) The proposed framework is a successful way to efficiently leverage transcribed data for speaker recognition, thus opening up a wide spectrum of research directions

...read moreread less

631 citations

Proceedings Article•DOI•

SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation

[...]

William M. Campbell¹, Douglas E. Sturim¹, Douglas A. Reynolds¹, Alex Solomonoff¹•Institutions (1)

Massachusetts Institute of Technology¹

14 May 2006

TL;DR: A support vector machine kernel is constructed using the GMM supervector and similarities based on this kernel between the method of SVM nuisance attribute projection (NAP) and the recent results in latent factor analysis are shown.

...read moreread less

Abstract: Gaussian mixture models with universal backgrounds (UBMs) have become the standard method for speaker recognition. Typically, a speaker model is constructed by MAP adaptation of the means of the UBM. A GMM supervector is constructed by stacking the means of the adapted mixture components. A recent discovery is that latent factor analysis of this GMM supervector is an effective method for variability compensation. We consider this GMM supervector in the context of support vector machines. We construct a support vector machine kernel using the GMM supervector. We show similarities based on this kernel between the method of SVM nuisance attribute projection (NAP) and the recent results in latent factor analysis. Experiments on a NIST SRE 2005 corpus demonstrate the effectiveness of the new technique.

...read moreread less

625 citations

Cites methods from "Support vector machines using GMM s..."

...Second, for a nonlinear kernel [10], SVM NAP uses a nonlinear expanded version of the GMM supervector....
[...]

Journal Article•DOI•

Speaker Recognition by Machines and Humans: A tutorial review

[...]

John H. L. Hansen¹, Taufiq Hasan¹•Institutions (1)

University of Texas at Dallas¹

14 Oct 2015-IEEE Signal Processing Magazine

TL;DR: A comparative study of human versus machine speaker recognition is concluded, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems.

...read moreread less

Abstract: Identifying a person by his or her voice is an important human trait most take for granted in natural human-to-human interaction/communication. Speaking to someone over the telephone usually begins by identifying who is speaking and, at least in cases of familiar speakers, a subjective verification by the listener that the identity is correct and the conversation can proceed. Automatic speaker-recognition systems have emerged as an important means of verifying identity in many e-commerce applications as well as in general business interactions, forensics, and law enforcement. Human experts trained in forensic speaker recognition can perform this task even better by examining a set of acoustic, prosodic, and linguistic characteristics of speech in a general approach referred to as structured listening. Techniques in forensic speaker recognition have been developed for many years by forensic speech scientists and linguists to help reduce any potential bias or preconceived understanding as to the validity of an unknown audio sample and a reference template from a potential suspect. Experienced researchers in signal processing and machine learning continue to develop automatic algorithms to effectively perform speaker recognition?with ever-improving performance?to the point where automatic systems start to perform on par with human listeners. In this article, we review the literature on speaker recognition by machines and humans, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems. We discuss different aspects of automatic systems, including voice-activity detection (VAD), features, speaker models, standard evaluation data sets, and performance metrics. Human speaker recognition is discussed in two parts?the first part involves forensic speaker-recognition methods, and the second illustrates how a na?ve listener performs this task from a neuroscience perspective. We conclude this review with a comparative study of human versus machine speaker recognition and attempt to point out strengths and weaknesses of each.

...read moreread less

554 citations

Proceedings Article•DOI•

EmotionSense: a mobile phones based adaptive platform for experimental social psychology research

[...]

Kiran K. Rachuri¹, Mirco Musolesi², Cecilia Mascolo¹, Peter J. Rentfrow¹, Chris Longworth¹, Andrius Aucinas¹ - Show less +2 more•Institutions (2)

University of Cambridge¹, University of St Andrews²

26 Sep 2010

TL;DR: It is shown how speakers and participants' emotions can be automatically detected by means of classifiers running locally on off-the-shelf mobile phones, and how speaking and interactions can be correlated with activity and location measures.

...read moreread less

Abstract: Today's mobile phones represent a rich and powerful computing platform, given their sensing, processing and communication capabilities. Phones are also part of the everyday life of billions of people, and therefore represent an exceptionally suitable tool for conducting social and psychological experiments in an unobtrusive way.de the ability of sensing individual emotions as well as activities, verbal and proximity interactions among members of social groups. Moreover, the system is programmable by means of a declarative language that can be used to express adaptive rules to improve power saving. We evaluate a system prototype on Nokia Symbian phones by means of several small-scale experiments aimed at testing performance in terms of accuracy and power consumption. Finally, we present the results of real deployment where we study participants emotions and interactions. We cross-validate our measurements with the results obtained through questionnaires filled by the users, and the results presented in social psychological studies using traditional methods. In particular, we show how speakers and participants' emotions can be automatically detected by means of classifiers running locally on off-the-shelf mobile phones, and how speaking and interactions can be correlated with activity and location measures.

...read moreread less

504 citations

Cites methods from "Support vector machines using GMM s..."

...1Alternative SVM-based schemes, including the popular GMMsupervector [7] and MLLR [28] kernel classifiers, were not considered as they are generally suitable for binary classification tasks....
[...]
...Alternative SVM-based schemes, including the popular GMMsupervector [7] and MLLR [28] kernel classifiers, were not considered as they are generally suitable for binary classification tasks....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Functional analysis

[...]

Walter Rudin

01 Jan 1973

14,545 citations

Journal Article•DOI•

Speaker Verification Using Adapted Gaussian Mixture Models

[...]

Douglas A. Reynolds¹, Thomas F. Quatieri¹, Robert B. Dunn¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2000-Digital Signal Processing

TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.

...read moreread less

4,673 citations

"Support vector machines using GMM s..." refers methods in this paper

...For GMM MAP training, we adapt only the means with a relevance factor of 16 [1]....
[...]
...The standard approach to this problem is to model the speaker using an adapted Gaussian mixture model (GMM) [1]....
[...]
...Given a speaker utterance, GMM UBM training is performed by MAP adaptation [1] of the means ....
[...]

Book•

Support Vector Machines

[...]

Ingo Steinwart, Andreas Christmann

12 Aug 2008

TL;DR: This book explains the principles that make support vector machines (SVMs) a successful modelling and prediction tool for a variety of applications and provides a unique in-depth treatment of both fundamental and recent material on SVMs that so far has been scattered in the literature.

...read moreread less

Abstract: This book explains the principles that make support vector machines (SVMs) a successful modelling and prediction tool for a variety of applications. The authors present the basic ideas of SVMs together with the latest developments and current research questions in a unified style. They identify three reasons for the success of SVMs: their ability to learn well with only a very small number of free parameters, their robustness against several types of model violations and outliers, and their computational efficiency compared to several other methods. Since their appearance in the early nineties, support vector machines and related kernel-based methods have been successfully applied in diverse fields of application such as bioinformatics, fraud detection, construction of insurance tariffs, direct marketing, and data and text mining. As a consequence, SVMs now play an important role in statistical machine learning and are used not only by statisticians, mathematicians, and computer scientists, but also by engineers and data analysts. The book provides a unique in-depth treatment of both fundamental and recent material on SVMs that so far has been scattered in the literature. The book can thus serve as both a basis for graduate courses and an introduction for statisticians, mathematicians, and computer scientists. It further provides a valuable reference for researchers working in the field. The book covers all important topics concerning support vector machines such as: loss functions and their role in the learning process; reproducing kernel Hilbert spaces and their properties; a thorough statistical analysis that uses both traditional uniform bounds and more advanced localized techniques based on Rademacher averages and Talagrand's inequality; a detailed treatment of classification and regression; a detailed robustness analysis; and a description of some of the most recent implementation techniques. To make the book self-contained, an extensive appendix is added which provides the reader with the necessary background from statistics, probability theory, functional analysis, convex analysis, and topology.

...read moreread less

4,664 citations

"Support vector machines using GMM s..." refers background or methods in this paper

...SUPPORT VECTOR MACHINES An SVM [5] is a two-class classifier constructed from sums of a kernel function K(·, ·),...
[...]
...Note that since it is linear, it satisfies the Mercer condition [5]....
[...]
...Since each of the terms in the sum in (11) is a kernel, and the sum of kernels is also a kernel, then (11) is also a kernel, see [5]....
[...]

Support Vector Machines for Large-Scale Regression Problems

[...]

Ronan Collobert, Samy Bengio

01 Jan 2000

TL;DR: In this paper, learning reference EPFL-REPORT-82604 is used to learn Reference EPFL this paper. But learning reference is not considered in this paper. http://publications.idiap.ch/downloads/reports/2000/rr00-17.pdf Record created on 2006-03-10, modified on 2017-05-10

...read moreread less

Abstract: Keywords: learning Reference EPFL-REPORT-82604 URL: http://publications.idiap.ch/downloads/reports/2000/rr00-17.pdf Record created on 2006-03-10, modified on 2017-05-10

...read moreread less

904 citations

Journal Article•DOI•

SVMTorch: support vector machines for large-scale regression problems

[...]

Ronan Collobert, Samy Bengio

01 Sep 2001-Journal of Machine Learning Research

TL;DR: Keywords: learning Reference EPFL-REPORT-82604 URL: http://publications.idiap.ch/downloads/reports/2000/rr00-17.pdf

...read moreread less

Abstract: Support Vector Machines (SVMs) for regression problems are trained by solving a quadratic optimization problem which needs on the order of l square memory and time resources to solve, where l is the number of training examples. In this paper, we propose a decomposition algorithm, SVMTorch (available at http://www.idiap.ch/learning/SVMTorch.html ), which is similar to SVM-Light proposed by Joachims (1999) for classification problems, but adapted to regression problems. With this algorithm, one can now efficiently solve large-scale regression problems (more than 20000 examples). Comparisons with Nodelib, another publicly available SVM algorithm for large-scale regression problems from Flake and Lawrence (2000) yielded significant time improvements. Finally, based on a recent paper from Lin (2000), we show that a convergence proof exists for our algorithm.

...read moreread less

829 citations

"Support vector machines using GMM s..." refers methods in this paper

...Both kernels in (8) and (12) were implemented using SVMTorch as an SVM trainer [7]....
[...]
...The vectors are support vectors and obtained from the training set by an optimization process [7]....
[...]