Home
/
Authors
/
Douglas E. Sturim

Author

Douglas E. Sturim

Bio: Douglas E. Sturim is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Speaker recognition & NIST. The author has an hindex of 19, co-authored 37 publications receiving 2667 citations. Previous affiliations of Douglas E. Sturim include Brown University.

Topics: Speaker recognition, NIST, Speaker diarisation, Mixture model, Word error rate ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Support vector machines using GMM supervectors for speaker verification

[...]

William M. Campbell¹, Douglas E. Sturim¹, Douglas A. Reynolds¹•Institutions (1)

Massachusetts Institute of Technology¹

10 Apr 2006-IEEE Signal Processing Letters

TL;DR: This work examines the idea of using the GMM supervector in a support vector machine (SVM) classifier and proposes two new SVM kernels based on distance metrics between GMM models that produce excellent classification accuracy in a NIST speaker recognition evaluation task.

...read moreread less

Abstract: Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker recognition. The standard training method for GMM models is to use MAP adaptation of the means of the mixture components based on speech from a target speaker. Recent methods in compensation for speaker and channel variability have proposed the idea of stacking the means of the GMM model to form a GMM mean supervector. We examine the idea of using the GMM supervector in a support vector machine (SVM) classifier. We propose two new SVM kernels based on distance metrics between GMM models. We show that these SVM kernels produce excellent classification accuracy in a NIST speaker recognition evaluation task.

...read moreread less

1,081 citations

Proceedings Article•DOI•

SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation

[...]

William M. Campbell¹, Douglas E. Sturim¹, Douglas A. Reynolds¹, Alex Solomonoff¹•Institutions (1)

Massachusetts Institute of Technology¹

14 May 2006

TL;DR: A support vector machine kernel is constructed using the GMM supervector and similarities based on this kernel between the method of SVM nuisance attribute projection (NAP) and the recent results in latent factor analysis are shown.

...read moreread less

Abstract: Gaussian mixture models with universal backgrounds (UBMs) have become the standard method for speaker recognition. Typically, a speaker model is constructed by MAP adaptation of the means of the UBM. A GMM supervector is constructed by stacking the means of the adapted mixture components. A recent discovery is that latent factor analysis of this GMM supervector is an effective method for variability compensation. We consider this GMM supervector in the context of support vector machines. We construct a support vector machine kernel using the GMM supervector. We show similarities based on this kernel between the method of SVM nuisance attribute projection (NAP) and the recent results in latent factor analysis. Experiments on a NIST SRE 2005 corpus demonstrate the effectiveness of the new technique.

...read moreread less

625 citations

Proceedings Article•DOI•

Speaker adaptive cohort selection for Tnorm in text-independent speaker verification

[...]

Douglas E. Sturim¹, D.A. Reynolds¹•Institutions (1)

Massachusetts Institute of Technology¹

18 Mar 2005

TL;DR: An extension to the widely used score normalization technique of test normalization (Tnorm) for text-independent speaker verification that offers advantages over the standard Tnorm by adjusting the speaker set to the target model is presented.

...read moreread less

Abstract: We discuss an extension to the widely used score normalization technique of test normalization (Tnorm) for text-independent speaker verification. A new method of speaker adaptive-Tnorm that offers advantages over the standard Tnorm by adjusting the speaker set to the target model is presented. Examples of this improvement using the 2004 NIST SRE data are also presented.

...read moreread less

112 citations

Proceedings Article•DOI•

Speaker indexing in large audio databases using anchor models

[...]

Douglas E. Sturim¹, D.A. Reynolds¹, Elliot Singer¹, Joseph P. Campbell•Institutions (1)

Massachusetts Institute of Technology¹

07 May 2001

TL;DR: The anchor modeling algorithm is refined by pruning the number of models needed and it is shown that its computational efficiency lends itself to speaker indexing for searching large audio databases for desired speakers.

...read moreread less

Abstract: Introduces the technique of anchor modeling in the applications of speaker detection and speaker indexing. The anchor modeling algorithm is refined by pruning the number of models needed. The system is applied to the speaker detection problem where its performance is shown to fall short of the state-of-the-art Gaussian mixture model with universal background model (GMM-UBM) system. However, it is further shown that its computational efficiency lends itself to speaker indexing for searching large audio databases for desired speakers. Here, excessive computation may prohibit the use of the GMM-UBM recognition system. Finally, the paper presents a method for cascading anchor model and GMM-UBM detectors for speaker indexing. This approach benefits from the efficiency of anchor modeling and high accuracy of GMM-UBM recognition.

...read moreread less

103 citations

Proceedings Article•DOI•

Tracking multiple talkers using microphone-array measurements

[...]

Douglas E. Sturim¹, M.S. Brandstein, Harvey F. Silverman•Institutions (1)

Brown University¹

21 Apr 1997

TL;DR: A method for tracking the positional estimates of multiple talkers in the operating region of an acoustic microphone array using a time-delay-based localization algorithm and a Kalman filter derived from a set of potential source motion models.

...read moreread less

Abstract: A method for tracking the positional estimates of multiple talkers in the operating region of an acoustic microphone array is presented. Initial talker location estimates are provided by a time-delay-based localization algorithm. These raw estimates are spatially smoothed by a Kalman filter derived from a set of potential source motion models. Data association techniques based on the estimate clusterings and source trajectories are incorporated to match location observations with individual talkers. Experimental results are presented for array recorded data using multiple talkers in a variety of scenarios.

...read moreread less

97 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Front-End Factor Analysis for Speaker Verification

[...]

Najim Dehak¹, Patrick Kenny, Réda Dehak², Pierre Dumouchel, Pierre Ouellet - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, École Pour l'Informatique et les Techniques Avancées²

01 May 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.

...read moreread less

Abstract: This paper presents an extension of our previous work which proposes a new speaker representation for speaker verification. In this modeling, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis. This space is named the total variability space because it models both speaker and channel variabilities. Two speaker verification systems are proposed which use this new representation. The first system is a support vector machine-based system that uses the cosine kernel to estimate the similarity between the input data. The second system directly uses the cosine similarity as the final decision score. We tested three channel compensation techniques in the total variability space, which are within-class covariance normalization (WCCN), linear discriminate analysis (LDA), and nuisance attribute projection (NAP). We found that the best results are obtained when LDA is followed by WCCN. We achieved an equal error rate (EER) of 1.12% and MinDCF of 0.0094 using the cosine distance scoring on the male English trials of the core condition of the NIST 2008 Speaker Recognition Evaluation dataset. We also obtained 4% absolute EER improvement for both-gender trials on the 10 s-10 s condition compared to the classical joint factor analysis scoring.

...read moreread less

3,526 citations

Proceedings Article•DOI•

X-Vectors: Robust DNN Embeddings for Speaker Recognition

[...]

David Snyder¹, Daniel Garcia-Romero¹, Gregory Sell¹, Daniel Povey¹, Sanjeev Khudanpur¹ - Show less +1 more•Institutions (1)

Johns Hopkins University¹

15 Apr 2018

TL;DR: This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.

...read moreread less

Abstract: In this paper, we use data augmentation to improve performance of deep neural network (DNN) embeddings for speaker recognition. The DNN, which is trained to discriminate between speakers, maps variable-length utterances to fixed-dimensional embeddings that we call x-vectors. Prior studies have found that embeddings leverage large-scale training datasets better than i-vectors. However, it can be challenging to collect substantial quantities of labeled data for training. We use data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness. The x-vectors are compared with i-vector baselines on Speakers in the Wild and NIST SRE 2016 Cantonese. We find that while augmentation is beneficial in the PLDA classifier, it is not helpful in the i-vector extractor. However, the x-vector DNN effectively exploits data augmentation, due to its supervised training. As a result, the x-vectors achieve superior performance on the evaluation datasets.

...read moreread less

2,300 citations

Journal Article•DOI•

An overview of text-independent speaker recognition: From features to supervectors

[...]

Tomi Kinnunen¹, Haizhou Li²•Institutions (2)

University of Eastern Finland¹, Institute for Infocomm Research Singapore²

01 Jan 2010-Speech Communication

TL;DR: This paper starts with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling and elaborate advanced computational techniques to address robustness and session variability.

...read moreread less

1,433 citations

Journal Article•DOI•

Support vector machines using GMM supervectors for speaker verification

[...]

William M. Campbell¹, Douglas E. Sturim¹, Douglas A. Reynolds¹•Institutions (1)

Massachusetts Institute of Technology¹

10 Apr 2006-IEEE Signal Processing Letters

...read moreread less

1,081 citations

Journal Article•DOI•

Biometrics: a tool for information security

[...]

Anil K. Jain¹, Arun Ross², Sharathchandra U. Pankanti³•Institutions (3)

Michigan State University¹, West Virginia University², IBM³

01 Nov 2006-IEEE Transactions on Information Forensics and Security

TL;DR: An overview of biometrics is provided and some of the salient research issues that need to be addressed for making biometric technology an effective tool for providing information security are discussed.

...read moreread less

Abstract: Establishing identity is becoming critical in our vastly interconnected society. Questions such as "Is she really who she claims to be?," "Is this person authorized to use this facility?," or "Is he in the watchlist posted by the government?" are routinely being posed in a variety of scenarios ranging from issuing a driver's license to gaining entry into a country. The need for reliable user authentication techniques has increased in the wake of heightened concerns about security and rapid advancements in networking, communication, and mobility. Biometrics, described as the science of recognizing an individual based on his or her physical or behavioral traits, is beginning to gain acceptance as a legitimate method for determining an individual's identity. Biometric systems have now been deployed in various commercial, civilian, and forensic applications as a means of establishing identity. In this paper, we provide an overview of biometrics and discuss some of the salient research issues that need to be addressed for making biometric technology an effective tool for providing information security. The primary contribution of this overview includes: 1) examining applications where biometric scan solve issues pertaining to information security; 2) enumerating the fundamental challenges encountered by biometric systems in real-world applications; and 3) discussing solutions to address the problems of scalability and security in large-scale authentication systems.

...read moreread less

1,067 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse