Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Automatic speechreading with applications to human-computer interfaces

[...]

Xiaozheng Zhang¹, Charles C. Broun², Russell M. Mersereau¹, Mark A. Clements¹•Institutions (2)

Georgia Institute of Technology¹, Motorola²

01 Jan 2002-EURASIP Journal on Advances in Signal Processing

TL;DR: This paper introduces a new algorithm for automatically locating the mouth region by using color and motion information and segmenting the lip region by making use of both color and edge information based on Markov random fields, and presents various visual feature performance comparisons to explore their impact on the recognition accuracy.

...read moreread less

Abstract: There has been growing interest in introducing speech as a new modality into the human-computer interface (HCI). Motivated by the multimodal nature of speech, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved system performance over acoustic-only methods, especially in noisy environments. In this paper, we investigate the usefulness of visual speech information in HCI related applications. We first introduce a new algorithm for automatically locating the mouth region by using color and motion information and segmenting the lip region by making use of both color and edge information based on Markov random fields. We then derive a relevant set of visual speech parameters and incorporate them into a recognition engine. We present various visual feature performance comparisons to explore their impact on the recognition accuracy, including the lip inner contour and the visibility of the tongue and teeth. By using a common visual feature set, we demonstrate two applications that exploit speechreading in a joint audio-visual speech signal processing task: speech recognition and speaker verification. The experimental results based on two databases demonstrate that the visual information is highly effective for improving recognition performance over a variety of acoustic noise levels.

...read moreread less

88 citations

Posted Content•

Unified Hypersphere Embedding for Speaker Recognition

[...]

Mahdi Hajibabaei, Dengxin Dai

22 Jul 2018-arXiv: Audio and Speech Processing

TL;DR: Results of experiments suggest that simple repetition and random time-reversion of utterances can reduce prediction errors by up to 18% and proposed logistic margin loss function leads to unified embeddings with state-of-the-art identification and competitive verification accuracies.

...read moreread less

Abstract: Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets. However, enlarging dataset and models increases the computation and storage costs and cannot be done indefinitely. In this work, we seek to improve the identification and verification accuracy of a text-independent speaker recognition system without use of extra data or deeper and more complex models by augmenting the training and testing data, finding the optimal dimensionality of embedding space and use of more discriminative loss functions. Results of experiments on VoxCeleb dataset suggest that: (i) Simple repetition and random time-reversion of utterances can reduce prediction errors by up to 18%. (ii) Lower dimensional embeddings are more suitable for verification. (iii) Use of proposed logistic margin loss function leads to unified embeddings with state-of-the-art identification and competitive verification accuracies.

...read moreread less

88 citations

Proceedings Article•

Benchmark Databases for Video-Based Automatic Sign Language Recognition

[...]

Philippe Dreuw¹, Carol Neidle², Vassilis Athitsos³, Stan Sclaroff², Hermann Ney¹ - Show less +1 more•Institutions (3)

RWTH Aachen University¹, Boston University², University of Texas at Arlington³

01 May 2008

TL;DR: A new, linguistically annotated, video database for automatic sign language recognition is presented, which includes the new RWTH-BOSTON-400 corpus, which consists of 843 sentences, several speakers and separate subsets for training, development, and testing.

...read moreread less

Abstract: A new, linguistically annotated, video database for automatic sign language recognition is presented The new RWTH-BOSTON-400 corpus, which consists of 843 sentences, several speakers and separate subsets for training, development, and testing is described in detail For evaluation and benchmarking of automatic sign language recognition, large corpora are needed Recent research has focused mainly on isolated sign language recognition methods using video sequences that have been recorded under lab conditions using special hardware like data gloves Such databases have often consisted generally of only one speaker and thus have been speaker-dependent, and have had only small vocabularies A new database access interface, which was designed and created to provide fast access to the database statistics and content, makes it possible to easily browse and retrieve particular subsets of the video database Preliminary baseline results on the new corpora are presented In contradistinction to other research in this area, all databases presented in this paper will be publicly available

...read moreread less

88 citations

Proceedings Article•DOI•

Inter dataset variability compensation for speaker recognition

[...]

Hagai Aronowitz¹•Institutions (1)

IBM¹

04 May 2014

TL;DR: This work analyzes the sources of degradation for a particular setup in the context of an i-vector PLDA system and concludes that the main source for degradation is ani-vector dataset shift, which is introduced using the nuisance attribute projection (NAP) method.

...read moreread less

Abstract: Recently satisfactory results have been obtained in NIST speaker recognition evaluations. These results are mainly due to accurate modeling of a very large development dataset provided by LDC. However, for many realistic scenarios the use of this development dataset is limited due to a dataset mismatch. In such cases, collection of a large enough dataset is infeasible. In this work we analyze the sources of degradation for a particular setup in the context of an i-vector PLDA system and conclude that the main source for degradation is an i-vector dataset shift. As a remedy, we introduce inter dataset variability compensation (IDVC) to explicitly compensate for dataset shift in the i-vector space. This is done using the nuisance attribute projection (NAP) method. Using IDVC we managed to reduce error dramatically by more than 50% for the domain mismatch setup.

...read moreread less

88 citations

Journal Article•

Fundamentals of speaker recognition

[...]

Figen Ertaş

20 Dec 2011-Pamukkale University Journal of Engineering Sciences

TL;DR: This paper introduces speakerrecognition in general and discusses its relevant parameters in relation to system performance.

...read moreread less

Abstract: The explosive growth of information technology in the last decade has made a considerable impact on the designand construction of systems for human-machine communication, which is becoming increasingly important inmany aspects of life. Amongst other speech processing tasks, a great deal of attention has been devoted todeveloping procedures that identify people from their voices, and the design and construction of speakerrecognition systems has been a fascinating enterprise pursued over many decades. This paper introduces speakerrecognition in general and discusses its relevant parameters in relation to system performance.

...read moreread less

88 citations

Collapse

Network Information

Performance

Metrics

15,632

Papers

337,766

Citations

No. of papers in the topic in previous years
Year	Papers
2023	165
2022	468
2021	283
2020	475
2019	484
2018	420

Speaker recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics