scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
Book
01 Jan 1980

228 citations

Proceedings ArticleDOI
04 May 2014
TL;DR: Modifications of the basic algorithm are developed which result in significant reductions in word error rates (WERs), and the algorithms are shown to combine well with speaker adaptation by backpropagation, resulting in a 9% relative WER reduction.
Abstract: We propose providing additional utterance-level features as inputs to a deep neural network (DNN) to facilitate speaker, channel and background normalization. Modifications of the basic algorithm are developed which result in significant reductions in word error rates (WERs). The algorithms are shown to combine well with speaker adaptation by backpropagation, resulting in a 9% relative WER reduction. We address implementation of the algorithm for a streaming task.

227 citations

Proceedings ArticleDOI
01 Dec 2014
TL;DR: A system that incorporates probabilistic linear discriminant analysis (PLDA) for i-vector scoring and uses unsupervised calibration of the PLDA scores to determine the clustering stopping criterion is proposed, and it is shown that PLDA scoring outperforms the same system with cosine scoring, and that overlapping segments reduce diarization error rate (DER) as well.
Abstract: Speaker diarization via unsupervised i-vector clustering has gained popularity in recent years In this approach, i-vectors are extracted from short clips of speech segmented from a larger multi-speaker conversation and organized into speaker clusters, typically according to their cosine score In this paper, we propose a system that incorporates probabilistic linear discriminant analysis (PLDA) for i-vector scoring, a method already frequently utilized in speaker recognition tasks, and uses unsupervised calibration of the PLDA scores to determine the clustering stopping criterion We also demonstrate that denser sampling in the i-vector space with overlapping temporal segments provides a gain in the diarization task We test our system on the CALLHOME conversational telephone speech corpus, which includes multiple languages and a varying number of speakers, and we show that PLDA scoring outperforms the same system with cosine scoring, and that overlapping segments reduce diarization error rate (DER) as well

226 citations

Patent
20 Mar 2007
TL;DR: In this paper, the authors present a client-server security system, which includes a client system receiving first biometric data and having a first level security authorization procedure, and a server system is provided for receiving second Biometric data.
Abstract: The present invention includes a client-server security system. The client-server security system includes a client system receiving first biometric data and having a first level security authorization procedure. In one embodiment, the first biometric data is speech data and the first level security authorization procedure includes a first speaker recognition algorithm. A server system is provided for receiving second biometric data. The server system includes a second level security authorization procedure. In one embodiment, the second biometric data is speech data and the second level security authorization procedure includes a second speaker recognition algorithm. In one embodiment, the first level security authorization procedure and the second level security authorization procedure comprise distinct biometric algorithms.

225 citations

Proceedings Article
01 Jan 1991
TL;DR: This paper presents some of the design considerations of BREF, a large read-speech corpus for French designed to provide continuous speech data for the development of dictation machines, for the evaluation of continuous speech recognition systems, and for the study of phonological variations.
Abstract: This paper presents some of the design considerations of BREF, a large read-speech corpus for French. BREF was designed to provide continuous speech data for the development of dictation machines, for the evaluation of continuous speech recognition systems (both speaker-dependent and speakerindependent), and for the study of phonological variations. The texts to be read were selected from 5 million words of the French newspaper, Le Monde. In total, 11,000 texts were selected, with selection criteria that emphasisized maximizing the number of distinct triphones. Separate text materials were selected for training and test corpora. Ninety speakers have been recorded, each providing between 5,000 and 10,000 words (approximately 40-70 min.) of speech.

225 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420