Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

MagNetO: X-vector Magnitude Estimation Network plus Offset for Improved Speaker Recognition.

[...]

Daniel Garcia-Romero, Gregory Sell, Alan V. McCree

01 Nov 2020

TL;DR: A magnitude estimation network that is combined with a modified ResNet x-vector system to generate embeddings whose inner product is able to produce calibrated scores with increased discrimination and calibration gains at multiple operating points is presented.

...read moreread less

Abstract: We present a magnitude estimation network that is combined with a modified ResNet x-vector system to generate embeddings whose inner product is able to produce calibrated scores with increased discrimination. A three-step training procedure is used. First, the network is trained using short segments and a multi-class cross-entropy loss with angular margin softmax. During the second step, only a reduced subset of the DNN parameters are refined using full-length recordings. Finally, the magnitude estimation network is trained using a binary crossentropy loss over pairs of target and non-target trials. The resulting system is evaluated on 4 widely-used benchmarks and provides significant discrimination and calibration gains at multiple operating points.

...read moreread less

66 citations

Proceedings Article•DOI•

An approach to text-independent speaker recognition with short utterances

[...]

K. Li, E. Wrench

14 Apr 1983

TL;DR: A new technique for text-independent speaker recognition is proposed which uses a statistical model of the speaker's vector quantized speech which retains text- independent properties while allowing considerably shorter test utterances than comparable speaker recognition systems.

...read moreread less

Abstract: A new technique for text-independent speaker recognition is proposed which uses a statistical model of the speaker's vector quantized speech. The technique retains text-independent properties while allowing considerably shorter test utterances than comparable speaker recognition systems. The frequently-occurring vectors or characters form a model of multiple points in the n dimensional speech space instead of the usual single point models, The speaker recognition depends on the statistical distribution of the distances between the speech frames from the unknown speaker and the closest points in the model. Models were generated with 100 seconds of conversational training speech for each of 11 male speakers. The system was able to identify 11 speakers with 96%, 87%, and 79% accuracy from sections of unknown speech of durations of 10, 5, and 3 seconds, respectively. Accurate recognition was also obtained even when there were variations in channels over which the training and testing data were obtained. A real-time demonstration system has been implemented including both training and recognition processes.

...read moreread less

66 citations

Proceedings Article•DOI•

Data selection for speech recognition

[...]

Yi Wu¹, Rong Zhang¹, Alexander I. Rudnicky¹•Institutions (1)

Carnegie Mellon University¹

01 Dec 2007

TL;DR: In contrast to the common belief that "there is no data like more data", it is found possible to select a highly informative subset of data that produces recognition performance comparable to a system that makes use of a much larger amount of data.

...read moreread less

Abstract: This paper presents a strategy for efficiently selecting informative data from large corpora of transcribed speech. We propose to choose data uniformly according to the distribution of some target speech unit (phoneme, word, character, etc). In our experiment, in contrast to the common belief that "there is no data like more data", we found it possible to select a highly informative subset of data that produces recognition performance comparable to a system that makes use of a much larger amount of data. At the same time, our selection process is efficient and fast.

...read moreread less

66 citations

Journal Article•DOI•

Robust Biometric Person Identification Using Automatic Classifier Fusion of Speech, Mouth, and Face Experts

[...]

Niall Fox¹, Ralph Gross², Jeffrey F. Cohn³, Richard B. Reilly¹•Institutions (3)

University College Dublin¹, Carnegie Mellon University², University of Pittsburgh³

01 Jun 2007-IEEE Transactions on Multimedia

TL;DR: A multiple expert biometric person identification system that combines information from three experts: audio, visual speech, and face in an automatic unsupervised manner, adapting to the local performance and output reliability of each of the three experts.

...read moreread less

Abstract: Information about person identity is multimodal. Yet, most person-recognition systems limit themselves to only a single modality, such as facial appearance. With a view to exploiting the complementary nature of different modes of information and increasing pattern recognition robustness to test signal degradation, we developed a multiple expert biometric person identification system that combines information from three experts: audio, visual speech, and face. The system uses multimodal fusion in an automatic unsupervised manner, adapting to the local performance (at the transaction level) and output reliability of each of the three experts. The expert weightings are chosen automatically such that the reliability measure of the combined scores is maximized. To test system robustness to train/test mismatch, we used a broad range of acoustic babble noise and JPEG compression to degrade the audio and visual signals, respectively. Identification experiments were carried out on a 248-subject subset of the XM2VTS database. The multimodal expert system outperformed each of the single experts in all comparisons. At severe audio and visual mismatch levels tested, the audio, mouth, face, and tri-expert fusion accuracies were 16.1%, 48%, 75%, and 89.9%, respectively, representing a relative improvement of 19.9% over the best performing expert

...read moreread less

66 citations

Proceedings Article•

Asr dependent techniques for speaker identification

[...]

Alex Park¹, Timothy J. Hazen•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2002

TL;DR: Alternative methods for performing speaker identification that utilize domain dependent automatic speech recognition (ASR) to provide a phonetic segmentation of the test utterance are described.

...read moreread less

Abstract: Traditional text independent speaker recognition systems are based on Gaussian Mixture Models (GMMs) trained globally over all speech from a given speaker. In this paper, we describe alternative methods for performing speaker identification that utilize domain dependent automatic speech recognition (ASR) to provide a phonetic segmentation of the test utterance. When evaluated on YOHO, several of these approaches were able outperform previously published results on the speaker ID task. On a more difficult conversational speech task, we were able to use a combination of classifiers to reduce identification error rates on single test utterances. Over multiple utterances, the ASR dependent approaches performed significantly better than the ASR independent methods. Using an approach we call speaker adaptive modeling for speaker identification, we were able to reduce speaker identification error rates by 39% over a baseline GMM approach when observing five test utterances from a speaker.

...read moreread less

66 citations

Collapse

Network Information

Performance

Metrics

15,632

Papers

337,766

Citations

No. of papers in the topic in previous years
Year	Papers
2023	165
2022	468
2021	283
2020	475
2019	484
2018	420

Speaker recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics