Home
/
Authors
/
Robert B. Dunn

Author

Robert B. Dunn

Other affiliations: Alcatel-Lucent

Bio: Robert B. Dunn is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Speaker recognition & Speech coding. The author has an hindex of 15, co-authored 30 publications receiving 5320 citations. Previous affiliations of Robert B. Dunn include Alcatel-Lucent.

Topics: Speaker recognition, Speech coding, Speaker diarisation, NIST, Signal ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Speaker Verification Using Adapted Gaussian Mixture Models

[...]

Douglas A. Reynolds¹, Thomas F. Quatieri¹, Robert B. Dunn¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2000-Digital Signal Processing

TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.

...read moreread less

4,673 citations

Patent•

Scalable and embedded codec for speech and audio signals

[...]

Joseph Gerard Aguilar¹, David A. Campana¹, Juin-Hwey Chen¹, Robert B. Dunn¹, Robert J. McAulay¹, Xiaoquin Sun¹, Wei Wang¹, Craig Robert Watkins¹, Robert W. Zopf¹ - Show less +5 more•Institutions (1)

Alcatel-Lucent¹

10 Aug 2007

TL;DR: In this article, a system and method for processing of audio and speech signals is disclosed, which provide compatibility over a range of communication devices operating at different sampling frequencies and/or bit rates.

...read moreread less

Abstract: A system and method for processing of audio and speech signals is disclosed, which provide compatibility over a range of communication devices operating at different sampling frequencies and/or bit rates. The analyzer of the system divides the input signal in different portions, at least one of which carries information sufficient to provide intelligible reconstruction of the input signal. The analyzer also encodes separate information about other portions of the signal in an embedded manner, so that a smooth transition can be achieved from low bit-rate to high bit-rate applications. Accordingly, communication devices operating at different sampling rates and/or bit-rates can extract corresponding information from the output bit stream of the analyzer. In the present invention embedded information generally relates to separate parameters of the input signal, or to additional resolution in the transmission of original signal parameters. Non-linear techniques for enhancing the overall performance of the system are also disclosed. Also disclosed is a novel method of improving the quantization of signal parameters. In a specific embodiment the input signal is processed in two or more modes dependent on the state of the signal in a frame. When the signal is determined to be in a transition state, the encoder provides phase information about N sinusoids, which the decoder end uses to improve the quality of the output signal at low bit rates.

...read moreread less

219 citations

Proceedings Article•

Fusing High- and Low-Level Features for Speaker Recognitionx

[...]

Joseph P. Campbell, Douglas A. Reynolds, Robert B. Dunn

01 Jan 2003

TL;DR: It is shown how novel features and classifiers provide complementary information and can be fused together to drive down the equal error rate on the 2001 NIST Extended Data Task to 0.2%—a 71% relative reduction in error over the previous state of the art.

...read moreread less

Abstract: The area of automatic speaker recognition has been dominated by systems using only short-term, low-level acoustic information, such as cepstral features. While these systems have produced low error rates, they ignore higher levels of information beyond low-level acoustics that convey speaker information. Recently published works have demonstrated that such high-level information can be used successfully in automatic speaker recognition systems by improving accuracy and potentially increasing robustness. Wide ranging high-levelfeature-based approaches using pronunciation models, prosodic dynamics, pitch gestures, phone streams, and conversational interactions were explored and developed under the SuperSID project at the 2002 JHU CLSP Summer Workshop (WS2002): http://www.clsp.jhu.edu/ws2002/groups/supersid/. In this paper, we show how these novel features and classifiers provide complementary information and can be fused together to drive down the equal error rate on the 2001 NIST Extended Data Task to 0.2%—a 71% relative reduction in error over the previous state of the art.

...read moreread less

104 citations

Proceedings Article•DOI•

Speaker verification using text-constrained Gaussian Mixture Models

[...]

Douglas E. Sturim¹, D.A. Reynolds¹, Robert B. Dunn¹, Thomas F. Quatieri¹•Institutions (1)

Massachusetts Institute of Technology¹

13 May 2002

TL;DR: An approach to close the gap between text-dependent and text-independent speaker verification performance is presented and results on the 2001 NIST extended data task show this approach can be used to produce an equal error rate.

...read moreread less

Abstract: In this paper we present an approach to close the gap between text-dependent and text-independent speaker verification performance. Text-constrained GMM-UBM systems are created using word segmentations produced by a LVCSR system on conversational speech allowing the system to focus on speaker differences over a constrained set of acoustic units. Results on the 2001 NIST extended data task show this approach can be used to produce an equal error rate of < 1 %.

...read moreread less

91 citations

Journal Article•DOI•

Approaches to Speaker Detection and Tracking in Conversational Speech

[...]

Robert B. Dunn¹, Douglas A. Reynolds¹, Thomas F. Quatieri¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2000-Digital Signal Processing

TL;DR: Two approaches to detecting and tracking speakers in multispeaker audio using an adapted Gaussian mixture model, universal background model (GMM-UBM) speaker detection system as the core speaker recognition engine and an external segmentational algorithm based on blind clustering are described.

...read moreread less

70 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Speaker Verification Using Adapted Gaussian Mixture Models

[...]

Douglas A. Reynolds¹, Thomas F. Quatieri¹, Robert B. Dunn¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2000-Digital Signal Processing

...read moreread less

4,673 citations

Journal Article•DOI•

Front-End Factor Analysis for Speaker Verification

[...]

Najim Dehak¹, Patrick Kenny, Réda Dehak², Pierre Dumouchel, Pierre Ouellet - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, École Pour l'Informatique et les Techniques Avancées²

01 May 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.

...read moreread less

Abstract: This paper presents an extension of our previous work which proposes a new speaker representation for speaker verification. In this modeling, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis. This space is named the total variability space because it models both speaker and channel variabilities. Two speaker verification systems are proposed which use this new representation. The first system is a support vector machine-based system that uses the cosine kernel to estimate the similarity between the input data. The second system directly uses the cosine similarity as the final decision score. We tested three channel compensation techniques in the total variability space, which are within-class covariance normalization (WCCN), linear discriminate analysis (LDA), and nuisance attribute projection (NAP). We found that the best results are obtained when LDA is followed by WCCN. We achieved an equal error rate (EER) of 1.12% and MinDCF of 0.0094 using the cosine distance scoring on the male English trials of the core condition of the NIST 2008 Speaker Recognition Evaluation dataset. We also obtained 4% absolute EER improvement for both-gender trials on the 10 s-10 s condition compared to the classical joint factor analysis scoring.

...read moreread less

3,526 citations

Book•

Distributed Systems: Principles and Paradigms

[...]

Andrew S. Tanenbaum, Maarten van Steen¹•Institutions (1)

VU University Amsterdam¹

01 Jan 2001

TL;DR: Intended for use in a senior/graduate level distributed systems course or by professionals, this text systematically shows how distributed systems are designed and implemented in real systems.

...read moreread less

Abstract: From the Publisher: Andrew Tanenbaum and Maarten van Steen cover the principles, advanced concepts, and technologies of distributed systems in detail, including: communication, replication, fault tolerance, and security. Intended for use in a senior/graduate level distributed systems course or by professionals, this text systematically shows how distributed systems are designed and implemented in real systems. Written in the superb writing style of other Tanenbaum books, the material also features unique accessibility and a wide variety of real-world examples and case studies, such as NFS v4, CORBA, DOM, Jini, and the World Wide Web. FEATURES Detailed coverage of seven key principles. An introductory chapter followed by a chapter devoted to each key principle: communication, processes, naming, synchronization, consistency and replication, fault tolerance, and security, including unique comprehensive coverage of middleware models. Four chapters devoted to state-of-the-art real-world examples of middleware. Covers object-based systems, document-based systems, distributed file systems, and coordination-based systems including CORBA, DCOM, Globe, NFS v4, Coda, the World Wide Web, and Jini. Excellent coverage of timely, advanced, distributed systems topics: Security, payment systems, recent Internet and Web protocols, scalability, and caching and replication. NEW-The Prentice Hall Companion Website for this book contains PowerPoint slides, figures in various file formats, and other teaching aids, and a link to the author's Web site.

...read moreread less

2,011 citations

Journal Article•DOI•

Survey on speech emotion recognition: Features, classification schemes, and databases

[...]

Moataz M. H. El Ayadi¹, Mohamed S. Kamel², Fakhri Karray²•Institutions (2)

Cairo University¹, University of Waterloo²

01 Mar 2011-Pattern Recognition

TL;DR: A survey of speech emotion classification addressing three important aspects of the design of a speech emotion recognition system, the choice of suitable features for speech representation, and the proper preparation of an emotional speech database for evaluating system performance are addressed.

...read moreread less

1,735 citations

Journal Article•DOI•

An overview of text-independent speaker recognition: From features to supervectors

[...]

Tomi Kinnunen¹, Haizhou Li²•Institutions (2)

University of Eastern Finland¹, Institute for Infocomm Research Singapore²

01 Jan 2010-Speech Communication

TL;DR: This paper starts with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling and elaborate advanced computational techniques to address robustness and session variability.

...read moreread less

1,433 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse