Topic
Speaker recognition
About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.
Papers published on a yearly basis
Papers
More filters
••
04 May 2014TL;DR: Modifications of the basic algorithm are developed which result in significant reductions in word error rates (WERs), and the algorithms are shown to combine well with speaker adaptation by backpropagation, resulting in a 9% relative WER reduction.
Abstract: We propose providing additional utterance-level features as inputs to a deep neural network (DNN) to facilitate speaker, channel and background normalization. Modifications of the basic algorithm are developed which result in significant reductions in word error rates (WERs). The algorithms are shown to combine well with speaker adaptation by backpropagation, resulting in a 9% relative WER reduction. We address implementation of the algorithm for a streaming task.
227 citations
••
01 Dec 2014TL;DR: A system that incorporates probabilistic linear discriminant analysis (PLDA) for i-vector scoring and uses unsupervised calibration of the PLDA scores to determine the clustering stopping criterion is proposed, and it is shown that PLDA scoring outperforms the same system with cosine scoring, and that overlapping segments reduce diarization error rate (DER) as well.
Abstract: Speaker diarization via unsupervised i-vector clustering has gained popularity in recent years In this approach, i-vectors are extracted from short clips of speech segmented from a larger multi-speaker conversation and organized into speaker clusters, typically according to their cosine score In this paper, we propose a system that incorporates probabilistic linear discriminant analysis (PLDA) for i-vector scoring, a method already frequently utilized in speaker recognition tasks, and uses unsupervised calibration of the PLDA scores to determine the clustering stopping criterion We also demonstrate that denser sampling in the i-vector space with overlapping temporal segments provides a gain in the diarization task We test our system on the CALLHOME conversational telephone speech corpus, which includes multiple languages and a varying number of speakers, and we show that PLDA scoring outperforms the same system with cosine scoring, and that overlapping segments reduce diarization error rate (DER) as well
226 citations
•
20 Mar 2007
TL;DR: In this paper, the authors present a client-server security system, which includes a client system receiving first biometric data and having a first level security authorization procedure, and a server system is provided for receiving second Biometric data.
Abstract: The present invention includes a client-server security system. The client-server security system includes a client system receiving first biometric data and having a first level security authorization procedure. In one embodiment, the first biometric data is speech data and the first level security authorization procedure includes a first speaker recognition algorithm. A server system is provided for receiving second biometric data. The server system includes a second level security authorization procedure. In one embodiment, the second biometric data is speech data and the second level security authorization procedure includes a second speaker recognition algorithm. In one embodiment, the first level security authorization procedure and the second level security authorization procedure comprise distinct biometric algorithms.
225 citations
•
01 Jan 1991
TL;DR: This paper presents some of the design considerations of BREF, a large read-speech corpus for French designed to provide continuous speech data for the development of dictation machines, for the evaluation of continuous speech recognition systems, and for the study of phonological variations.
Abstract: This paper presents some of the design considerations of BREF, a large read-speech corpus for French. BREF was designed to provide continuous speech data for the development of dictation machines, for the evaluation of continuous speech recognition systems (both speaker-dependent and speakerindependent), and for the study of phonological variations. The texts to be read were selected from 5 million words of the French newspaper, Le Monde. In total, 11,000 texts were selected, with selection criteria that emphasisized maximizing the number of distinct triphones. Separate text materials were selected for training and test corpora. Ninety speakers have been recorded, each providing between 5,000 and 10,000 words (approximately 40-70 min.) of speech.
225 citations