scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
Proceedings Article
01 Dec 2012
TL;DR: To reduce false acceptance rate caused by spoofing attack, a general anti-spoofing attack framework is proposed for the speaker verification systems, where a converted speech detector is adopted as a post-processing module for the Speaker verification system's acceptance decision.
Abstract: Voice conversion technique, which modifies one speaker's (source) voice to sound like another speaker (target), presents a threat to automatic speaker verification. In this paper, we first present new results of evaluating the vulnerability of current state-of-the-art speaker verification systems: Gaussian mixture model with joint factor analysis (GMM-JFA) and probabilistic linear discriminant analysis (PLDA) systems, against spoofing attacks. The spoofing attacks are simulated by two voice conversion techniques: Gaussian mixture model based conversion and unit selection based conversion. To reduce false acceptance rate caused by spoofing attack, we propose a general anti-spoofing attack framework for the speaker verification systems, where a converted speech detector is adopted as a post-processing module for the speaker verification system's acceptance decision. The detector decides whether the accepted claim is human speech or converted speech. A subset of the core task in the NIST SRE 2006 corpus is used to evaluate the vulnerability of speaker verification system and the performance of converted speech detector. The results indicate that both conversion techniques can increase the false acceptance rate of GMM-JFA and PLDA system, while the converted speech detector can reduce the false acceptance rate from 31.54% and 41.25% to 1.64% and 1.71% for GMM-JFA and PLDA system on unit-selection based converted speech, respectively.

97 citations

Patent
04 Aug 2014
TL;DR: In this paper, a controller for a voice-controlled device is provided, which includes a setting module and a recognition module, and the recognition module compares a confident score of speech recognition with the threshold to accordingly execute voice control.
Abstract: A controller for a voice-controlled device is provided. The controller includes a setting module and a recognition module. The setting module generates a threshold according to an environmental parameter. The recognition module compares a confident score of speech recognition with the threshold to accordingly execute voice control.

97 citations

Journal ArticleDOI
TL;DR: Two approaches are concentrated on extracting features that are robust against channel variations and transforming the speaker models to compensate for channel effects, which resulted in a 38% relative improvement on the closed-set 30-s training 5-s testing condition of the NIST'95 Evaluation task.
Abstract: This paper addresses the issue of closed-set text-independent speaker identification from samples of speech recorded over the telephone. It focuses on the effects of acoustic mismatches between training and testing data, and concentrates on two approaches: (1) extracting features that are robust against channel variations and (2) transforming the speaker models to compensate for channel effects. First, an experimental study shows that optimizing the front end processing of the speech signal can significantly improve speaker recognition performance. A new filterbank design is introduced to improve the robustness of the speech spectrum computation in the front-end unit. Next, a new feature based on spectral slopes is described. Its ability to discriminate between speakers is shown to be superior to that of the traditional cepstrum. This feature can be used alone or combined with the cepstrum. The second part of the paper presents two model transformation methods that further reduce channel effects. These methods make use of a locally collected stereo database to estimate a speaker-independent variance transformation for each speech feature used by the classifier. The transformations constructed on this stereo database can then be applied to speaker models derived from other databases. Combined, the methods developed in this paper resulted in a 38% relative improvement on the closed-set 30-s training 5-s testing condition of the NIST'95 Evaluation task, after cepstral mean removal.

97 citations

Proceedings Article
07 Sep 2001
TL;DR: A combined system for punctuation generation and speech recognition that incorporates prosodic information with acoustic and language model information is discussed, which can improve the Fmeasure of punctuation recognition by 19% relative.
Abstract: In this paper, we discuss a combined system for punctuation generation and speech recognition. This system incorporates prosodic information with acoustic and language model information. Experiments are conducted for both the reference transcriptions and speech recogniser outputs. For the reference transcription case, prosodic information is shown to be more useful than language model information. When these information sources are combined, we can obtain an F-measure of up to 0.7830 for punctuation recognition. A few straightforward modi cations of a conventional speech recogniser allow the system to produce punctuation and speech recognition hypotheses simultaneously. The multiple hypotheses are produced by the automatic speech recogniser and are re-scored by prosodic information. When prosodic information is incorporated, the Fmeasure can be improved by 19% relative. At the same time, small reductions in word error rate are obtained.

97 citations

Journal ArticleDOI
TL;DR: This paper starts with a thorough analysis of the spoofing effects of five speech synthesis and eight voice conversion systems, and the vulnerability of three speaker verification systems under those attacks, and introduces a number of countermeasures to prevent spoofing attacks.
Abstract: In this paper, we present a systematic study of the vulnerability of automatic speaker verification to a diverse range of spoofing attacks. We start with a thorough analysis of the spoofing effects of five speech synthesis and eight voice conversion systems, and the vulnerability of three speaker verification systems under those attacks. We then introduce a number of countermeasures to prevent spoofing attacks from both known and unknown attackers. Known attackers are spoofing systems whose output was used to train the countermeasures, while an unknown attacker is a spoofing system whose output was not available to the countermeasures during training. Finally, we benchmark automatic systems against human performance on both speaker verification and spoofing detection tasks.

97 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420