Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

An introduction to voice search

[...]

Ye-Yi Wang¹, Dong Yu, Yun-Cheng Ju, Alejandro Acero•Institutions (1)

Shanghai Jiao Tong University¹

18 Apr 2008-IEEE Signal Processing Magazine

TL;DR: This article categorized spoken dialog technology into form filling, call routing, and voice search, and reviewed the voice search technology.

...read moreread less

Abstract: Voice search is the technology underlying many spoken dialog systems (SDSs) that provide users with the information they request with a spoken query. The information normally exists in a large database, and the query has to be compared with a field in the database to obtain the relevant information. The contents of the field, such as business or product names, are often unstructured text. This article categorized spoken dialog technology into form filling, call routing, and voice search, and reviewed the voice search technology. The categorization was made from the technological perspective. It is important to note that a single SDS may apply the technology from multiple categories. Robustness is the central issue in voice search. The technology in acoustic modeling aims at improved robustness to environment noise, different channel conditions, and speaker variance; the pronunciation research addresses the problem of unseen word pronunciation and pronunciation variance; the language model research focuses on linguistic variance; the studies in search give rise to improved robustness to linguistic variance and ASR errors; the dialog management research enables graceful recovery from confusions and understanding errors; and the learning in the feedback loop speeds up system tuning for more robust performance. While tremendous achievements have been accomplished in the past decade on voice search, large challenges remain. Many voice search dialog systems have automation rates around or below 50% in field trials.

...read moreread less

95 citations

Proceedings Article•

Connectionist speaker normalization and adaptation.

[...]

Victor Abrash, Horacio Franco, Ananth Sankar, Michael Cohen

01 Jan 1995

TL;DR: This paper explores supervised speaker adaptation and normalization in the MLP component of a hybrid hidden Markov model/ multilayer perceptron version of SRI's DECIPHERTM speech recognition system.

...read moreread less

Abstract: In a speaker-independent, large-vocabulary continuous speech recognition systems, recognition accuracy varies considerably from speaker to speaker, and performance may be significantly degraded for outlier speakers such as nonnative talkers. In this paper, we explore supervised speaker adaptation and normalization in the MLP component of a hybrid hidden Markov model/ multilayer perceptron version of SRI's DECIPHERTM speech recognition system. Normalization is implemented through an additional transformation network that preprocesses the cepstral input to the MLP. Adaptation is accomplished through incremental retraining of the MLP weights on adaptation data. Our approach combines both adaptation and normalization in a single, consistent manner, works with limited adaptation data, and is text-independent. We show significant improvement in recognition accuracy.

...read moreread less

95 citations

Patent•DOI•

Instantaneous context switching for speech recognition systems

[...]

Vince M. Stanford¹, Alice G. Klein¹, Norman Frederick Brickman¹•Institutions (1)

IBM¹

18 May 1995-Journal of the Acoustical Society of America

TL;DR: An instantaneous context switching speech recognition system is disclosed which enables a speech recognition application to be changed without loading new pattern matching data into the system.

...read moreread less

Abstract: An instantaneous context switching speech recognition system is disclosed which enables a speech recognition application to be changed without loading new pattern matching data into the system. Selectable pointer maps are included in the memory of the system which selectively change the relationship between words and phonemes between a first application context and the pattern matching logic to a second application context and the pattern matching logic.

...read moreread less

95 citations

Proceedings Article•DOI•

Text-dependent speaker recognition using PLDA with uncertainty propagation

[...]

Themos Stafylakis¹, Patrick Kenny², Pierre Ouellet, Javier Pérez³, Marcel Kockmann⁴, Pierre Dumouchel² - Show less +2 more•Institutions (4)

National Technical University of Athens¹, École de technologie supérieure², Polytechnic University of Catalonia³, Brno University of Technology⁴

25 Aug 2013

TL;DR: A phrase-dependent PLDA model with uncertainty propagation is introduced and it is shown that despite its low channel variability, improved results over the GMM-UBM model are attained.

...read moreread less

Abstract: In this paper, we apply and enhance the i-vector-PLDA paradigm to text-dependent speaker recognition. Due to its origin in text-independent speaker recognition, this paradigm does not make use of the phonetic content of each utterance. Moreover, the uncertainty in the i-vector estimates should be taken into account in the PLDA model, due to the short duration of the utterances. To bridge this gap, a phrase-dependent PLDA model with uncertainty propagation is introduced. We examined it on the RSR-2015 dataset and we show that despite its low channel variability, improved results over the GMM-UBM model are attained.

...read moreread less

95 citations

Proceedings Article•DOI•

Improving speaker recognition performance in the domain adaptation challenge using deep neural networks

[...]

Daniel Garcia-Romero¹, Xiaohui Zhang¹, Alan V. McCree¹, Daniel Povey¹•Institutions (1)

Johns Hopkins University¹

01 Dec 2014

TL;DR: This paper explores the use of DNNs to collect SS for the unsupervised domain adaptation task of the Domain Adaptation Challenge (DAC), and shows that collecting SS with a DNN trained on out-of-domain data boosts the speaker recognition performance of an out- of-domain system by more than 25%.

...read moreread less

Abstract: Traditional i-vector speaker recognition systems use a Gaussian mixture model (GMM) to collect sufficient statistics (SS). Recently, replacing this GMM with a deep neural network (DNN) has shown promising results. In this paper, we explore the use of DNNs to collect SS for the unsupervised domain adaptation task of the Domain Adaptation Challenge (DAC).We show that collecting SS with a DNN trained on out-of-domain data boosts the speaker recognition performance of an out-of-domain system by more than 25%. Moreover, we integrate the DNN in an unsupervised adaptation framework, that uses agglomerative hierarchical clustering with a stopping criterion based on unsupervised calibration, and show that the initial gains of the out-of-domain system carry over to the final adapted system. Despite the fact that the DNN is trained on the out-of-domain data, the final adapted system produces a relative improvement of more than 30% with respect to the best published results on this task.

...read moreread less

95 citations

Collapse

Network Information

Performance

Metrics

15,632

Papers

337,766

Citations

No. of papers in the topic in previous years
Year	Papers
2023	165
2022	468
2021	283
2020	475
2019	484
2018	420

Speaker recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics