scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
Proceedings ArticleDOI
17 May 2004
TL;DR: It is shown how a joint factor analysis of inter-speaker and intra-speakers variability in a training database which contains multiple recordings for each speaker can be used to construct likelihood ratio statistics for speaker verification which take account of intra-Speaker variation and channel variation in a principled way.
Abstract: We show how a joint factor analysis of inter-speaker and intra-speaker variability in a training database which contains multiple recordings for each speaker can be used to construct likelihood ratio statistics for speaker verification which take account of intra-speaker variation and channel variation in a principled way. We report the results of experiments on the NIST 2001 cellular one speaker detection task carried out by applying this type of factor analysis to Switchboard Cellular Part I. The evaluation data for this task is contained in Switchboard Cellular Part I so these results cannot be taken at face value but they indicate that the factor analysis model can perform extremely well if it is perfectly estimated.

79 citations

Journal ArticleDOI
TL;DR: This paper presents a scalable derivation of PLDA which is theoretically equivalent to the previous nonscalable solution and thus obviates the need for a variational approximation.
Abstract: In this paper, we present a scalable and exact solution for probabilistic linear discriminant analysis (PLDA). PLDA is a probabilistic model that has been shown to provide state-of-the-art performance for both face and speaker recognition. However, it has one major drawback: At training time estimating the latent variables requires the inversion and storage of a matrix whose size grows quadratically with the number of samples for the identity (class). To date, two approaches have been taken to deal with this problem, to 1) use an exact solution that calculates this large matrix and is obviously not scalable with the number of samples or 2) derive a variational approximation to the problem. We present a scalable derivation which is theoretically equivalent to the previous nonscalable solution and thus obviates the need for a variational approximation. Experimentally, we demonstrate the efficacy of our approach in two ways. First, on labeled faces in the wild, we illustrate the equivalence of our scalable implementation with previously published work. Second, on the large Multi-PIE database, we illustrate the gain in performance when using more training samples per identity (class), which is made possible by the proposed scalable formulation of PLDA.

78 citations

Proceedings ArticleDOI
27 Apr 1993
TL;DR: An algorithm for attributing a sample of unconstrained speech to one of several known speakers is described, based on measurement of the similarity of distributions of features extracted from reference speech samples and from the sample to be attributed.
Abstract: An algorithm for attributing a sample of unconstrained speech to one of several known speakers is described. The algorithm is based on measurement of the similarity of distributions of features extracted from reference speech samples and from the sample to be attributed. The measure of feature distribution similarity employed is not based on any assumed form of the distributions involved. The theoretical basis of the algorithm is examined, and a plausible connection is shown to the divergence statistic of Kullback (1972). Experimental results are presented for the King telephone database and the Switchboard database. The performance of the algorithm is better than that reported for algorithms based on Gaussian modeling and robust discrimination. >

78 citations

Journal ArticleDOI
TL;DR: Traditional and modern techniques which make real-world speaker verification systems robust in degradation due to the presence of ambient noise; channel variations, aging effects, and availability of limited training samples are presented.
Abstract: Even though the subject of speaker verification has been investigated for several decades, numerous challenges and new opportunities in robust recognition techniques are still being explored. In this overview paper we first provide a brief introduction to statistical pattern recognition techniques that are commonly used for speaker verification. The second part of the paper presents traditional and modern techniques which make real-world speaker verification systems robust in degradation due to the presence of ambient noise; channel variations, aging effects, and availability of limited training samples. The paper concludes with discussions on future trends and research opportunities in this area.

78 citations

Proceedings ArticleDOI
26 May 2013
TL;DR: This study assesses the performance of Probabilistic Linear Discriminant Analysis (PLDA) and i-vector normalization for a text-dependent verification task and suggests that such scoring regime remains to be optimized.
Abstract: The importance of phonetic variability for short duration speaker verification is widely acknowledged. This paper assesses the performance of Probabilistic Linear Discriminant Analysis (PLDA) and i-vector normalization for a text-dependent verification task. We show that using a class definition based on both speaker and phonetic content significantly improves the performance of a state-of-the-art system. We also compare four models for computing the verification scores using multiple enrollment utterances and show that using PLDA intrinsic scoring obtains the best performance in this context. This study suggests that such scoring regime remains to be optimized.

78 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420