scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
BookDOI
01 Jan 2008
TL;DR: Speech Recognition in Mobile Phones, Handheld Speech to Speech Translation System, Automotive Speech Recognition, Energy Aware Speech recognition for Mobile Devices.
Abstract: Network Speech Recognition.- Network, Distributed and Embedded Speech Recognition: An Overview.- Speech Coding and Packet Loss Effects on Speech and Speaker Recognition.- Speech Recognition Over Mobile Networks.- Speech Recognition Over IP Networks.- Distributed Speech Recognition.- Distributed Speech Recognition Standards.- Speech Feature Extraction and Reconstruction.- Quantization of Speech Features: Source Coding.- Error Recovery: Channel Coding and Packetization.- Error Concealment.- Embedded Speech Recognition.- Algorithm Optimizations: Low Computational Complexity.- Algorithm Optimizations: Low Memory Footprint.- Fixed-Point Arithmetic.- Systems and Applications.- Software Architectures for Networked Mobile Speech Applications.- Speech Recognition in Mobile Phones.- Handheld Speech to Speech Translation System.- Automotive Speech Recognition.- Energy Aware Speech Recognition for Mobile Devices.

75 citations

Journal ArticleDOI
18 Feb 2021
TL;DR: The ASVspoof 2019 challenge as discussed by the authors was the third in a series of bi-annual challenges, and the top performing single and ensemble system submissions from 62 teams, all of which out-performed the two baseline systems by a substantial margin.
Abstract: The ASVspoof initiative was conceived to spearhead research in anti-spoofing for automatic speaker verification (ASV). This paper describes the third in a series of bi-annual challenges: ASVspoof 2019. With the challenge database and protocols being described elsewhere, the focus of this paper is on results and the top performing single and ensemble system submissions from 62 teams, all of which out-perform the two baseline systems, often by a substantial margin. Deeper analyses shows that performance is dominated by specific conditions involving either specific spoofing attacks or specific acoustic environments. While fusion is shown to be particularly effective for the logical access scenario involving speech synthesis and voice conversion attacks, participants largely struggled to apply fusion successfully for the physical access scenario involving simulated replay attacks. This is likely the result of a lack of system complementarity, while oracle fusion experiments show clear potential to improve performance. Furthermore, while results for simulated data are promising, experiments with real replay data show a substantial gap, most likely due to the presence of additive noise in the latter. This finding, among others, leads to a number of ideas for further research and directions for future editions of the ASVspoof challenge.

75 citations

Journal ArticleDOI
TL;DR: Experiments show that the proposed robust MFCC-based feature significantly reduces the recognition error rate over a wide signal-to-noise ratio range.

75 citations

Proceedings ArticleDOI
04 May 2014
TL;DR: This work has evaluated the proposed direct SC-based adaptation method in the large scale 320-hr Switchboard task and shown that the proposed method leads to up to 8% relative reduction in word error rate in Switchboard by using only a very small number of adaptation utterances per speaker.
Abstract: Recently an effective fast speaker adaptation method using discriminative speaker code (SC) has been proposed for the hybrid DNN-HMM models in speech recognition [1]. This adaptation method depends on a joint learning of a large generic adaptation neural network for all speakers as well as multiple small speaker codes using the standard back-propagation algorithm. In this paper, we propose an alternative direct adaptation in model space, where speaker codes are directly connected to the original DNN models through a set of new connection weights, which can be estimated very efficiently from all or part of training data. As a result, the proposed method is more suitable for large scale speech recognition tasks since it eliminates the time-consuming training process to estimate another adaptation neural networks. In this work, we have evaluated the proposed direct SC-based adaptation method in the large scale 320-hr Switchboard task. Experimental results have shown that the proposed SC-based rapid adaptation method is very effective not only for small recognition tasks but also for very large scale tasks. For example, it has shown that the proposed method leads to up to 8% relative reduction in word error rate in Switchboard by using only a very small number of adaptation utterances per speaker (from 10 to a few dozens). Moreover, the extra training time required for adaptation is also significantly reduced from the method in [1].

75 citations

Proceedings ArticleDOI
13 May 2002
TL;DR: DYPSA is automatic and operates using the speech signal alone without the need for an EGG or Laryngograph signal and incorporates a new technique for estimating GCI candidates and employs dynamic programming to select the most likely candidates according to a defined cost function.
Abstract: We present the DYPSA algorithm for automatic and reliable estimation of glottal closure instants (GCIs) in voiced speech. Reliable GCI estimation is essential for closed-phase speech analysis, from which can be derived features of the vocal tract and, separately, the voice source. It has been shown that such features can be used with significant advantages in applications such as speaker recognition. DYPSA is automatic and operates using the speech signal alone without the need for an EGG or Laryngograph signal. It incorporates a new technique for estimating GCI candidates and employs dynamic programming to select the most likely candidates according to a defined cost function. We review and evaluate three existing methods and compare our new algorithm to them. Results for DYPSA show GCI detection accuracy to within ±0.25ms on 87% of the test database and fewer than 1% false alarms and misses.

75 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420