scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
Proceedings ArticleDOI
14 May 2006
TL;DR: This paper examines the problem of kernel selection for one-versus-all (OVA) classification of multiclass data with support vector machines (SVMs) and focuses specifically on generalized linear kernels of the form, k(x1,x2) = xT 1Rx2 , where R is a positive semidefinite matrix.
Abstract: In this paper we examie the problem of kernel seection for one-versus-all (OVA) classification of multiclass data with support vector machines (SVMs). We focus specifically on the problem of training what we refer to as generalized linear kernels—that is, kernels of the form, k(x 1 , x 2 ) = xT 1 Rx 2 , where R is a positive semidefinite matrix. Our approach for training k(x 1 , x 2 ) involves first constructing a set of upper bounds on the rates of false positives and false negatives at a given score threshold. Under various conditions, minimizing these bounds leads to the closed-form solution, R = W-1, where W is the expected within-class covariance matrix of the data. We tested various parameterizations of R, including a diagonal parameterization that simply performs per-feature variance normalization, on the 1-conversation training condition of the SRE-2003 and SRE-2004 speaker reecognition tasks. In experiments on a state-of-the-art MLLR-SVM speaker recognition system [1], the parameterization, R = [see above equation in pdf file], wheere [see above equation in pdf file] is a smoothed estimate of W, achieves relative reductions in the minimum decision cost function (DCF) [2] of up to 22% below the results obtained when R does per-feature variance normalization.

68 citations

Journal ArticleDOI
TL;DR: The proposed feature space transformation technique demonstrates a significant improvement of the performance with no addition of new features to the original input space and it is expected that this technique could provide good results in other areas such as speaker verification and/or identification.

68 citations

Patent
31 Jul 1998
TL;DR: In this paper, the authors used therapidly available speech recognition results to provide intelligent barge-in for voice-response systems and, to count words to output sub-sequences to provide paralleling and/or pipelining of tasks related to the entire word sequence to increase processing throughput.
Abstract: Speech recognition technology has attained maturity such that the most likely speech recognition result has been reached and is available before an energy based termination of speech has been made. The present invention innovatively uses therapidly available speech recognition results to provide intelligent barge-in forvoice-response systems and, to count words to output sub-sequences to provide paralleling and/or pipelining of tasks related to the entire word sequence to increase processing throughput.

68 citations

Journal ArticleDOI
TL;DR: The accurate speaker tracking provided by the audio-visual sensor array proved beneficial to improve the recognition performance in a microphone array-based speech recognition system, both in terms of enhancement and recognition.
Abstract: This paper addresses the problem of distant speech acquisition in multiparty meetings, using multiple microphones and cameras. Microphone array beamforming techniques present a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering. Beamforming techniques, however, rely on knowledge of the speaker location. In this paper, we present an integrated approach, in which an audio-visual multiperson tracker is used to track active speakers with high accuracy. Speech enhancement is then achieved using microphone array beamforming followed by a novel postfiltering stage. Finally, speech recognition is performed to evaluate the quality of the enhanced speech signal. The approach is evaluated on data recorded in a real meeting room for stationary speaker, moving speaker, and overlapping speech scenarios. The results show that the speech enhancement and recognition performance achieved using our approach are significantly better than a single table-top microphone and are comparable to a lapel microphone for some of the scenarios. The results also indicate that the audio-visual-based system performs significantly better than audio-only system, both in terms of enhancement and recognition. This reveals that the accurate speaker tracking provided by the audio-visual sensor array proved beneficial to improve the recognition performance in a microphone array-based speech recognition system.

68 citations

Patent
27 Jan 1983
TL;DR: In this article, an individual verification apparatus consisting of a verification data file (20), a speech input section (10), a data memory (30), speech recognition unit (40), and a speaker verification unit (50) is described.
Abstract: An individual verification apparatus comprises a verification data file (20), a speech input section (10), a data memory (30), a speech recognition unit (40), and a speaker verification unit (50). In the verification data file key codes set by customers and corresponding reference data for individual verification are registered. Speech of the key code spoken by a customer is processed by the speech input section (10) and the result is stored in the data memory (30). The speech recognition unit (40) recognizes the input key code based on the key code data stored in the data memory (30). The speaker verification unit (50) verifies the customer by comparing the key code data with speech reference data of customers having the recognized key code.

68 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420