scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
Patent
26 May 2006
TL;DR: In this article, a method for authenticating a user based on the phrase, the biometric voice print, and the device identifier is presented. But the method is limited to a single user and cannot be used to authenticate multiple users.
Abstract: A method (700) and system (900) for authenticating a user is provided. The method can include receiving one or more spoken utterances from a user (702), recognizing a phrase corresponding to one or more spoken utterances (704), identifying a biometric voice print of the user from one or more spoken utterances of the phrase (706), determining a device identifier associated with the device (708), and authenticating the user based on the phrase, the biometric voice print, and the device identifier (710). A location of the handset or the user can be employed as criteria for granting access to one or more resources (712).

181 citations

Journal ArticleDOI
TL;DR: An improved clustering method is integrated with an existing re-segmentation algorithm and an iterative optimization scheme is implemented that demonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner.
Abstract: In speaker diarization, standard approaches typically perform speaker clustering on some initial segmentation before refining the segment boundaries in a re-segmentation step to obtain a final diarization hypothesis. In this paper, we integrate an improved clustering method with an existing re-segmentation algorithm and, in iterative fashion, optimize both speaker cluster assignments and segmentation boundaries jointly. For clustering, we extend our previous research using factor analysis for speaker modeling. In continuing to take advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features (i.e., i-vectors), we develop a probabilistic approach to speaker clustering by applying a Bayesian Gaussian Mixture Model (GMM) to principal component analysis (PCA)-processed i-vectors. We then utilize information at different temporal resolutions to arrive at an iterative optimization scheme that, in alternating between clustering and re-segmentation steps, demonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner. Our proposed methods attain results that are comparable to those of a state-of-the-art benchmark set on the multi-speaker CallHome telephone corpus. We further compare our system with a Bayesian nonparametric approach to diarization and attempt to reconcile their differences in both methodology and performance.

181 citations

Journal ArticleDOI
TL;DR: A novel method is proposed which finds accurate alignments between source and target speaker utterances which modifies the utterance of a source speaker to sound-like speech from a target speaker.

181 citations

Journal ArticleDOI
TL;DR: The most frequently used approach-based on a modified Hidden Markov Model (HMM) phonetic recognizer is analyzed, and a general framework for the local refinement of boundaries is proposed, and the performance of several pattern classification approaches is compared within this framework.
Abstract: This paper presents the results and conclusions of a thorough study on automatic phonetic segmentation. It starts with a review of the state of the art in this field. Then, it analyzes the most frequently used approach-based on a modified Hidden Markov Model (HMM) phonetic recognizer. For this approach, a statistical correction procedure is proposed to compensate for the systematic errors produced by context-dependent HMMs, and the use of speaker adaptation techniques is considered to increase segmentation precision. Finally, this paper explores the possibility of locally refining the boundaries obtained with the former techniques. A general framework is proposed for the local refinement of boundaries, and the performance of several pattern classification approaches (fuzzy logic, neural networks and Gaussian mixture models) is compared within this framework. The resulting phonetic segmentation scheme was able to increase the performance of a baseline HMM segmentation tool from 27.12%, 79.27%, and 97.75% of automatic boundary marks with errors smaller than 5, 20, and 50 ms, respectively, to 65.86%, 96.01%, and 99.31% in speaker-dependent mode, which is a reasonably good approximation to manual segmentation.

181 citations

Proceedings ArticleDOI
23 May 1989
TL;DR: In this paper, an alternative approach to speaker adaptation for a large-vocabulary hidden-Markov-model-based speech recognition system is described, based on the use of a stochastic model representing the different properties of the new speaker and an old speaker for which the full training set of 20 minutes is available.
Abstract: An alternative approach to speaker adaptation for a large-vocabulary hidden-Markov-model-based speech recognition system is described. The goal of this investigation was to train the IBM speech recognition system with only five minutes of speech data from a new speaker instead of the usual 20 minutes without the recognition rate dropping by more than 1-2%. The approach is based on the use of a stochastic model representing the different properties of the new speaker and an old speaker for which the full training set of 20 minutes is available. It is called a speaker Markov model. It is shown how the parameters of such a model can be derived and how it can be used for transforming the training set of the old speaker in order to use it in addition to the short training set of the new speaker. The adaptation algorithm was tested with 12 speakers. The average recognition rate dropped from 96.4% to 95.2% for a 5000-word vocabulary task. The decoding time increased by a factor of 1.35; this factor is often 3-5 if other adaptation algorithms are used. >

180 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420