scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
PatentDOI
TL;DR: In this article, a continuous speech recognition system with a speech processor and a word recognition computer subsystem is described, which is characterized by an element for developing a graph for confluent links between confluent nodes.
Abstract: A continuous speech recognition system having a speech processor and a word recognition computer subsystem, characterized by an element for developing a graph for confluent links between confluent nodes; an element for developing a graph of boundary links between adjacent words; an element for storing an inventory of confluent links and boundary links as a coding inventory; an element for converting an unknown utterance into an encoded sequence of confluent links and boundary links corresponding to recognition sequences stored in the word recognition subsystem recognition vocabulary for speech recognition. The invention also includes a method for achieving continouous speech recognition by characterizing speech as a sequence of confluent links which are matched with candidate words. The invention also applies to isolated word speech recognition as with continuous speech recognition, except that in such case there are no boundary links.

68 citations

Posted Content
TL;DR: The "VOiCES from a Distance Challenge 2019" is designed to foster research in the area of speaker recognition and automatic speech recognition with the special focus on single channel distant/far-field audio, under noisy conditions.
Abstract: The "VOiCES from a Distance Challenge 2019" is designed to foster research in the area of speaker recognition and automatic speech recognition (ASR) with the special focus on single channel distant/far-field audio, under noisy conditions. The main objectives of this challenge are to: (i) benchmark state-of-the-art technology in the area of speaker recognition and automatic speech recognition (ASR), (ii) support the development of new ideas and technologies in speaker recognition and ASR, (iii) support new research groups entering the field of distant/far-field speech processing, and (iv) provide a new, publicly available dataset to the community that exhibits realistic distance characteristics.

67 citations

Proceedings ArticleDOI
21 Nov 2005
TL;DR: This work proposes an algorithm to incorporate detected face positions in different camera views into the Kalman filter without doing any explicit triangulation, which yields a robust source localizer that functions reliably both for segments wherein the speaker is silent, which would be detrimental for an audio only tracker, and wherein many faces appear, which will confuse a video only tracker.
Abstract: In prior work, we proposed using an extended Kalman filter to directly update position estimates in a speaker localization system based on time delays of arrival. We found that such a scheme provided superior tracking quality as compared with the conventional closed-form approximation methods. In this work, we enhance our audio localizer with video information. We propose an algorithm to incorporate detected face positions in different camera views into the Kalman filter without doing any explicit triangulation. This approach yields a robust source localizer that functions reliably both for segments wherein the speaker is silent, which would be detrimental for an audio only tracker, and wherein many faces appear, which would confuse a video only tracker. We tested our algorithm on a data set consisting of seminars held by actual speakers. Our experiments revealed that the audio-video localizer functioned better than a localizer based solely on audio or solely on video features.

67 citations

25 Jan 1995
TL;DR: The HTK large vocabulary continuous speech recognition system uses tied-state crossword context-dependent mixture Gaussian HMMs and a dynamic network decoder that can operate in a single pass to allow exible and eecient system development, as well as multi-pass operation for use with computationally expensive acoustic and/or language models.
Abstract: This paper describes recent developments of the HTK large vocabulary continuous speech recognition system. The system uses tied-state crossword context-dependent mixture Gaussian HMMs and a dynamic network decoder that can operate in a single pass. In the last year the decoder has been extended to produce word lattices to allow exible and eecient system development, as well as multi-pass operation for use with computationally expensive acoustic and/or language models. The system vocabulary can now be up to 65k words, the nal acoustic models have been extended to be sensitive to more acoustic context (quinphones), a 4-gram language model has been used and unsupervised incremental speaker adaptation incorporated. The resulting system gave the lowest error rates on both the H1-P0 and H1-C1 hub tasks in the November 1994 ARPA CSR evaluation.

67 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420