Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Posted Content•

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

[...]

Xu Xiang¹, Shuai Wang¹, Houjun Huang, Yanmin Qian¹, Kai Yu¹ - Show less +1 more•Institutions (1)

Shanghai Jiao Tong University¹

18 Jun 2019-arXiv: Audio and Speech Processing

TL;DR: Three different margin based losses which not only separate classes but also demand a fixed margin between classes are introduced to deep speaker embedding learning and it could be demonstrated that the margin is the key to obtain more discriminative speaker embeddings.

...read moreread less

Abstract: Recently, speaker embeddings extracted from a speaker discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, the DNN speaker classifier is trained using cross entropy loss with softmax. However, this kind of loss function does not explicitly encourage inter-class separability and intra-class compactness. As a result, the embeddings are not optimal for speaker recognition tasks. In this paper, to address this issue, three different margin based losses which not only separate classes but also demand a fixed margin between classes are introduced to deep speaker embedding learning. It could be demonstrated that the margin is the key to obtain more discriminative speaker embeddings. Experiments are conducted on two public text independent tasks: VoxCeleb1 and Speaker in The Wild (SITW). The proposed approach can achieve the state-of-the-art performance, with 25% ~ 30% equal error rate (EER) reduction on both tasks when compared to strong baselines using cross entropy loss with softmax, obtaining 2.238% EER on VoxCeleb1 test set and 2.761% EER on SITW core-core test set, respectively.

...read moreread less

67 citations

Patent•

Hybrid Speech Recognition

[...]

Detlef Koll

31 Aug 2009

TL;DR: In this paper, a hybrid speech recognition system uses a client-side speech recognition engine and a server-side SPR engine to produce speech recognition results for the same speech for arbitration.

...read moreread less

Abstract: A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results.

...read moreread less

67 citations

Proceedings Article•DOI•

Recognition of conversational telephone speech using the JANUS speech engine

[...]

T. Zeppenfeld¹, Michael Finke, Klaus Ries, Martin Westphal, Alex Waibel - Show less +1 more•Institutions (1)

Carnegie Mellon University¹

21 Apr 1997

TL;DR: Through a number of algorithmic improvements, this work has been able to reduce error rates from more than 50% word error to 38%, measured on the offical 1996 NIST evaluation test set.

...read moreread less

Abstract: Recognition of conversational speech is one of the most challenging speech recognition tasks to-date. While recognition error rates of 10% or lower can now be reached on speech dictation tasks over vocabularies in excess of 60,000 words, recognition of conversational speech has persistently resisted most attempts at improvements by way of the proven techniques to date. Difficulties arise from shorter words, telephone channel degradation, and highly disfluent and coarticulated speech. In this paper, we describe the application, adaptation, and performance evaluation of our JANUS speech recognition engine to the Switchboard conversational speech recognition task. Through a number of algorithmic improvements, we have been able to reduce error rates from more than 50% word error to 38%, measured on the offical 1996 NIST evaluation test set. Improvements include vocal tract length normalization, polyphonic modeling, label boosting, speaker adaptation with and without confidence measures, and speaking mode dependent pronunciation modeling.

...read moreread less

67 citations

Proceedings Article•

On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals

[...]

Federico Alegre¹, Ravichander Vipperla¹, Nicholas Evans¹, Benoit Fauve•Institutions (1)

Institut Eurécom¹

18 Oct 2012

TL;DR: A new approach based on artificial, tone-like signals which provoke higher ASV scores than genuine client tests is introduced, demonstrating the importance of efforts to develop dedicated countermeasures to protect ASV systems from spoofing.

...read moreread less

Abstract: Automatic speaker verification (ASV) systems are increasingly being used for biometric authentication even if their vulnerability to imposture or spoofing is now widely acknowledged. Recent work has proposed different spoofing approaches which can be used to test vulnerabilities. This paper introduces a new approach based on artificial, tone-like signals which provoke higher ASV scores than genuine client tests. Experimental results show degradations in the equal error rate from 8.5% to 77.3% and from 4.8% to 64.3% for standard Gaussian mixture model and factor analysis based ASV systems respectively. These findings demonstrate the importance of efforts to develop dedicated countermeasures, some of them trivial, to protect ASV systems from spoofing.

...read moreread less

67 citations

Journal Article•DOI•

Babble Noise: Modeling, Analysis, and Applications

[...]

Nitish Krishnamurthy¹, John H. L. Hansen¹•Institutions (1)

University of Texas at Dallas¹

01 Sep 2009-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This study represents effectively the first effort in developing an overall model for speech babble, and with this, contributions are made for speech system robustness in noise.

...read moreread less

Abstract: Speech babble is one of the most challenging noise interference for all speech systems. Here, a systematic approach to model its underlying structure is proposed to further the existing knowledge of speech processing in noisy environments. This paper establishes a working foundation for the analysis and modeling of babble speech. We first address the underlying model for multiple speaker babble speech - considering the number of conversations versus the number of speakers contributing to babble. Next, based on this model, we develop an algorithm to detect the range of the number of speakers within an unknown babble speech sequence. Evaluation is performed using 110 h of data from the Switchboard corpus. The number of simultaneous conversations ranges from one to nine, or one to 18 subjects speaking. A speaker conversation stream detection rate in excess of 80% is achieved with a speaker window size of plusmn1 speakers. Finally, the problem of in-set/out-of-set speaker recognition is considered in the context of interfering babble speech noise. Results are shown for test durations from 2-8 s, with babble speaker groups ranging from two to nine subjects. It is shown that by choosing the correct number of speakers in the background babble an overall average performance gain of 6.44% equal error rate can be obtained. This study represents effectively the first effort in developing an overall model for speech babble, and with this, contributions are made for speech system robustness in noise.

...read moreread less

67 citations

Collapse

Network Information

Performance

Metrics

15,632

Papers

337,766

Citations

No. of papers in the topic in previous years
Year	Papers
2023	165
2022	468
2021	283
2020	475
2019	484
2018	420

Speaker recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics