scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
Proceedings ArticleDOI
12 May 2019
TL;DR: It is found that diarization substantially reduces error rate when there are multiple speakers, while maintaining excellent performance on single-speaker recordings.
Abstract: Recently, deep neural networks that map utterances to fixed-dimensional embeddings have emerged as the state-of-the-art in speaker recognition. Our prior work introduced x-vectors, an embedding that is very effective for both speaker recognition and diarization. This paper combines our previous work and applies it to the problem of speaker recognition on multi-speaker conversations. We measure performance on Speakers in the Wild and report what we believe are the best published error rates on this dataset. Moreover, we find that diarization substantially reduces error rate when there are multiple speakers, while maintaining excellent performance on single-speaker recordings. Finally, we introduce an easily implemented method to remove the domain-sensitive threshold typically used in the clustering stage of a diarization system. The proposed method is more robust to domain shifts, and achieves similar results to those obtained using a well-tuned threshold.

280 citations

Proceedings ArticleDOI
23 May 1989
TL;DR: A word-spotting system using Gaussian hidden Markov models is presented and it is observed that performance can be greatly affected by the choice of features used, the covariance structure of the Gaussian models, and transformations based on energy and feature distributions.
Abstract: A word-spotting system using Gaussian hidden Markov models is presented. Several aspects of this problem are investigated. Specifically, results are reported on the use of various signal processing and feature transformation techniques. The authors have observed that performance can be greatly affected by the choice of features used, the covariance structure of the Gaussian models, and transformations based on energy and feature distributions. Due to the open-set nature of the problem, the specific techniques for modeling out-of-vocabulary speech and the choice of scoring metric can have a significant effect on performance. >

280 citations

PatentDOI
TL;DR: In this paper, a real-time speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user, where the partitioning of responsibility for speech recognition operations can be done on a client by client or connection by connection basis.
Abstract: A real-time speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user. Both the client and server can dedicate a variable number of processing resources for performing speech recognition functions. The partitioning of responsibility for speech recognition operations can be done on a client by client or connection by connection basis.

279 citations

Journal ArticleDOI
TL;DR: A speech recognition model is proposed in which the transformation from an input speech signal into a sequence of phonemes is carried out largely through an active or feedback process.
Abstract: A speech recognition model is proposed in which the transformation from an input speech signal into a sequence of phonemes is carried out largely through an active or feedback process. In this process, patterns are generated internally in the analyzer according to an adaptable sequence of instructions until a best match with the input signal is obtained. Details of the process are given, and the areas where further research is needed are indicated.

278 citations

Journal ArticleDOI
TL;DR: The experimental results show that the instantaneous and transitional representations are relatively uncorrelated, thus providing complementary information for speaker recognition, and simple transmission channel variations are shown to affect both the instantaneous spectral representations and the corresponding recognition performance significantly.
Abstract: The use of instantaneous and transitional spectral representations of spoken utterances for speaker recognition is investigated. Linear-predictive-coding (LPC)-derived cepstral coefficients are used to represent instantaneous spectral information, and best linear fits of each cepstral coefficient over a specified time window are used to represent transitional information. An evaluation has been carried out using a database of isolated digit utterances over dialed-up telephone lines by 10 talkers. Two vector quantization (VQ) codebooks, instantaneous and transitional, were constructed from each speaker's training utterances. The experimental results show that the instantaneous and transitional representations are relatively uncorrelated, thus providing complementary information for speaker recognition. A rectangular window of approximately 100 ms duration provides an effective estimate of the transitional spectral features for speaker recognition. Also, simple transmission channel variations are shown to affect both the instantaneous spectral representations and the corresponding recognition performance significantly, while the transitional representations and performance are relatively resistant. >

278 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420