Home
/
Topics
/
Speaker recognition

Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Speaker Recognition for Multi-speaker Conversations Using X-vectors

[...]

David Snyder¹, Daniel Garcia-Romero¹, Gregory Sell¹, Alan V. McCree¹, Daniel Povey¹, Sanjeev Khudanpur¹ - Show less +2 more•Institutions (1)

Johns Hopkins University¹

12 May 2019

TL;DR: It is found that diarization substantially reduces error rate when there are multiple speakers, while maintaining excellent performance on single-speaker recordings.

...read moreread less

Abstract: Recently, deep neural networks that map utterances to fixed-dimensional embeddings have emerged as the state-of-the-art in speaker recognition. Our prior work introduced x-vectors, an embedding that is very effective for both speaker recognition and diarization. This paper combines our previous work and applies it to the problem of speaker recognition on multi-speaker conversations. We measure performance on Speakers in the Wild and report what we believe are the best published error rates on this dataset. Moreover, we find that diarization substantially reduces error rate when there are multiple speakers, while maintaining excellent performance on single-speaker recordings. Finally, we introduce an easily implemented method to remove the domain-sensitive threshold typically used in the clustering stage of a diarization system. The proposed method is more robust to domain shifts, and achieves similar results to those obtained using a well-tuned threshold.

...read moreread less

280 citations

Proceedings Article•DOI•

Continuous hidden Markov modeling for speaker-independent word spotting

[...]

J.R. Rohlicek, W. Russell, Salim Roukos, H. Gish

23 May 1989

TL;DR: A word-spotting system using Gaussian hidden Markov models is presented and it is observed that performance can be greatly affected by the choice of features used, the covariance structure of the Gaussian models, and transformations based on energy and feature distributions.

...read moreread less

Abstract: A word-spotting system using Gaussian hidden Markov models is presented. Several aspects of this problem are investigated. Specifically, results are reported on the use of various signal processing and feature transformation techniques. The authors have observed that performance can be greatly affected by the choice of features used, the covariance structure of the Gaussian models, and transformations based on energy and feature distributions. Due to the open-set nature of the problem, the specific techniques for modeling out-of-vocabulary speech and the choice of scoring metric can have a significant effect on performance. >

...read moreread less

280 citations

Patent•DOI•

Adjustable resource based speech recognition system

[...]

Ian M. Bennett, Bandi Ramesh Babu, Kishor Morkhandikar, Pallaki Gururaj

20 Nov 2006-Journal of the Acoustical Society of America

TL;DR: In this paper, a real-time speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user, where the partitioning of responsibility for speech recognition operations can be done on a client by client or connection by connection basis.

...read moreread less

Abstract: A real-time speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user. Both the client and server can dedicate a variable number of processing resources for performing speech recognition functions. The partitioning of responsibility for speech recognition operations can be done on a client by client or connection by connection basis.

...read moreread less

279 citations

Journal Article•DOI•

Speech recognition: A model and a program for research

[...]

Morris Halle¹, Kenneth N. Stevens¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Feb 1962-IEEE Transactions on Information Theory

TL;DR: A speech recognition model is proposed in which the transformation from an input speech signal into a sequence of phonemes is carried out largely through an active or feedback process.

...read moreread less

Abstract: A speech recognition model is proposed in which the transformation from an input speech signal into a sequence of phonemes is carried out largely through an active or feedback process. In this process, patterns are generated internally in the analyzer according to an adaptable sequence of instructions until a best match with the input signal is obtained. Details of the process are given, and the areas where further research is needed are indicated.

...read moreread less

278 citations

Journal Article•DOI•

On the use of instantaneous and transitional spectral information in speaker recognition

[...]

F.K. Soong¹, Aaron E. Rosenberg¹•Institutions (1)

Bell Labs¹

01 Jun 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The experimental results show that the instantaneous and transitional representations are relatively uncorrelated, thus providing complementary information for speaker recognition, and simple transmission channel variations are shown to affect both the instantaneous spectral representations and the corresponding recognition performance significantly.

...read moreread less

Abstract: The use of instantaneous and transitional spectral representations of spoken utterances for speaker recognition is investigated. Linear-predictive-coding (LPC)-derived cepstral coefficients are used to represent instantaneous spectral information, and best linear fits of each cepstral coefficient over a specified time window are used to represent transitional information. An evaluation has been carried out using a database of isolated digit utterances over dialed-up telephone lines by 10 talkers. Two vector quantization (VQ) codebooks, instantaneous and transitional, were constructed from each speaker's training utterances. The experimental results show that the instantaneous and transitional representations are relatively uncorrelated, thus providing complementary information for speaker recognition. A rectangular window of approximately 100 ms duration provides an effective estimate of the transitional spectral features for speaker recognition. Also, simple transmission channel variations are shown to affect both the instantaneous spectral representations and the corresponding recognition performance significantly, while the transitional representations and performance are relatively resistant. >

...read moreread less

278 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
…
21
22
23
24
25
26
27
…
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

15,632

Papers

337,766

Citations

No. of papers in the topic in previous years
Year	Papers
2023	165
2022	468
2021	283
2020	475
2019	484
2018	420

Speaker recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics