scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
Proceedings ArticleDOI
Ken-ichi Iso1, Takao Watanabe1
01 May 1990
TL;DR: A speech recognition model called the neural prediction model (NPM) is proposed, which uses a sequence of multilayer perceptrons (MLPs) as a separate nonlinear predictor for each class to represent temporal structures of speech patterns as recognition cues.
Abstract: A speech recognition model called the neural prediction model (NPM) is proposed. The model uses a sequence of multilayer perceptrons (MLPs) as a separate nonlinear predictor for each class. It is designed to represent temporal structures of speech patterns as recognition cues. In particular, temporal correlation in successive feature vectors of a speech pattern is represented in the mappings formed as MLP input-output relations. Temporal distortion of speech is efficiently normalized by a dynamic-programming technique. Recognition and training algorithms are presented based on the combination of dynamic-programming and back-propagation techniques. Evaluation experiments were conducted using ten-digit vocabulary samples uttered by 107 speakers. A 99.8% recognition accuracy was obtained. This suggests that the model is effective for speaker-independent speech recognition. >

98 citations

Journal ArticleDOI
TL;DR: A new representation of the residue is proposed and its corresponding recognition performance is analysed by issuing experiments in the context of text-independent speaker verification, which suggests the possibility of an improvement over current speaker recognition approaches based on nothing but the usual synthesis filter features.

98 citations

Journal ArticleDOI
TL;DR: Experimental results based on the NIST 2010 SRE dataset suggest that the proposed VAD outperforms conventional ones whenever interview-style speech is involved, and it is demonstrated that noise reduction is vital for energy-based VAD under low SNR.

98 citations

Posted Content
TL;DR: This paper conducts the first comprehensive and systematic study of the adversarial attacks on SR systems (SRSs) to understand their security weakness in the practical black-box setting, and proposes an adversarial attack, named FakeBob, to craft adversarial samples.
Abstract: Speaker recognition (SR) is widely used in our daily life as a biometric authentication or identification mechanism. The popularity of SR brings in serious security concerns, as demonstrated by recent adversarial attacks. However, the impacts of such threats in the practical black-box setting are still open, since current attacks consider the white-box setting only. In this paper, we conduct the first comprehensive and systematic study of the adversarial attacks on SR systems (SRSs) to understand their security weakness in the practical blackbox setting. For this purpose, we propose an adversarial attack, named FAKEBOB, to craft adversarial samples. Specifically, we formulate the adversarial sample generation as an optimization problem, incorporated with the confidence of adversarial samples and maximal distortion to balance between the strength and imperceptibility of adversarial voices. One key contribution is to propose a novel algorithm to estimate the score threshold, a feature in SRSs, and use it in the optimization problem to solve the optimization problem. We demonstrate that FAKEBOB achieves 99% targeted attack success rate on both open-source and commercial systems. We further demonstrate that FAKEBOB is also effective on both open-source and commercial systems when playing over the air in the physical world. Moreover, we have conducted a human study which reveals that it is hard for human to differentiate the speakers of the original and adversarial voices. Last but not least, we show that four promising defense methods for adversarial attack from the speech recognition domain become ineffective on SRSs against FAKEBOB, which calls for more effective defense methods. We highlight that our study peeks into the security implications of adversarial attacks on SRSs, and realistically fosters to improve the security robustness of SRSs.

98 citations

Proceedings ArticleDOI
09 May 1995
TL;DR: Text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both clean, wideband speech and telephone speech are presented and the standard degradations of filtering and additive noise do not account for all of the performance gap between the TIMIT and NTIMIT data.
Abstract: The two largest factors affecting automatic speaker identification performance are the size of the population and the degradations introduced by noisy communication channels (e.g., telephone transmission). To examine experimentally these two factors, this paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both clean, wideband speech and telephone speech. A system based on Gaussian mixture speaker models is used for speaker identification and experiments are conducted on the TIMIT and NTIMIT databases. This is believed to be the first speaker identification experiments on the complete 630 speaker TIMIT and NTIMIT databases and the largest text-independent speaker identification task reported to date. Identification accuracies of 99.5% and 60.7% are achieved on the TIMIT and NTIMIT databases, respectively. This paper also presents experiments which examine and attempt to quantify the performance loss associated with various telephone degradations by systematically degrading the TIMIT speech in a manner consistent with measured NTIMIT degradations and measuring the performance loss at each step. It is found that the standard degradations of filtering and additive noise do not account for all of the performance gap between the TIMIT and NTIMIT data. Measurements of nonlinear microphone distortions are also described which may explain the additional performance loss.

98 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420