Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18.

[...]

Jesús Villalba¹, Nanxin Chen¹, David Snyder¹, Daniel Garcia-Romero¹, Alan V. McCree¹, Gregory Sell¹, Jonas Borgstrom², Fred Richardson², Suwon Shon², Francois Grondin², Réda Dehak³, Leibny Paola Garcia-Perera¹, Daniel Povey¹, Pedro A. Torres-Carrasquillo², Sanjeev Khudanpur¹, Najim Dehak¹ - Show less +12 more•Institutions (3)

Johns Hopkins University¹, Massachusetts Institute of Technology², École Pour l'Informatique et les Techniques Avancées³

15 Sep 2019

TL;DR: Very deep xvector architectures–Extended and Factorized TDNN, and ResNets– clearly outperformed shallower xvectors and i-vectors in NIST SRE18, and Extended TDNN x-vector was the best single system.

...read moreread less

Abstract: We present a condensed description of the joint effort of JHUCLSP, JHU-HLTCOE, MIT-LL., MIT CSAIL and LSE-EPITA for NIST SRE18. All the developed systems consisted of xvector/i-vector embeddings with some flavor of PLDA backend. Very deep x-vector architectures–Extended and Factorized TDNN, and ResNets– clearly outperformed shallower xvectors and i-vectors. The systems were tailored to the video (VAST) or to the telephone (CMN2) condition. The VAST data was challenging, yielding 4 times worse performance than other video based datasets like Speakers in the Wild. We were able to calibrate the VAST data with very few development trials by using careful adaptation and score normalization methods. The VAST primary fusion yielded EER=10.18% and Cprimary=0.431. By improving calibration in post-eval, we reached Cprimary=0.369. In CMN2, we used unsupervised SPLDA adaptation based on agglomerative clustering and score normalization to correct the domain shift between English and Tunisian Arabic models. The CMN2 primary fusion yielded EER=4.5% and Cprimary=0.313. Extended TDNN x-vector was the best single system obtaining EER=11.1% and Cprimary=0.452 in VAST; and 4.95% and 0.354 in CMN2.

...read moreread less

101 citations

Patent•

Method and apparatus for automated speaker parameters adaptation in a deployed speaker verification system

[...]

Daniele Colibro¹, Claudio Vair¹, Kevin R. Farrell¹•Institutions (1)

Nuance Communications¹

25 Feb 2013

TL;DR: In this paper, a method and apparatus employing classifier adaptation based on field data in a deployed voice-based interactive system comprise: collecting representations of voice characteristics, in association with corresponding speakers, the representations being generated by the deployed voice based interactive system; updating parameters of the classifier, used in speaker recognition, based on the representations collected.

...read moreread less

Abstract: Typical speaker verification systems usually employ speakers' audio data collected during an enrollment phase when users enroll with the system and provide respective voice samples. Due to technical, business, or other constraints, the enrollment data may not be large enough or rich enough to encompass different inter-speaker and intra-speaker variations. According to at least one embodiment, a method and apparatus employing classifier adaptation based on field data in a deployed voice-based interactive system comprise: collecting representations of voice characteristics, in association with corresponding speakers, the representations being generated by the deployed voice-based interactive system; updating parameters of the classifier, used in speaker recognition, based on the representations collected; and employing the classifier, with the corresponding parameters updated, in performing speaker recognition.

...read moreread less

100 citations

Proceedings Article•DOI•

High quality voice morphing

[...]

Hui Ye¹, Steve Young¹•Institutions (1)

University of Cambridge¹

17 May 2004

TL;DR: This paper describes a complete voice morphing system and the enhancements needed for dealing with the various artifacts, including a novel method for synthesising natural phase dispersion.

...read moreread less

Abstract: Voice morphing is a technique for modifying a source speaker's speech to sound as if it was spoken by some designated target speaker. Most of the recent approaches to voice morphing apply a linear transformation to the spectral envelope and pitch scaling to modify the prosody. Whilst these methods are effective, they also introduce artifacts arising from the effects of glottal coupling, phase incoherence, unnatural phase dispersion and the high spectral variance of unvoiced sounds. A practical voice morphing system must account for these if high audio quality is to be preserved. This paper describes a complete voice morphing system and the enhancements needed for dealing with the various artifacts, including a novel method for synthesising natural phase dispersion. Each technique is assessed individually and the overall performance of the system evaluated using listening tests. Overall it is found that the enhancements significantly improve speaker identification scores and perceived audio quality.

...read moreread less

100 citations

Proceedings Article•DOI•

Parametric trajectory models for speech recognition

[...]

Herbert Gish, Kenney Ng

03 Oct 1996

TL;DR: The development of parametric trajectory models for speech recognition are extended to include time-varying covariances and the approach for defining a metric between speech segments based on trajectory models is described; it is important in developing mixture models of trajectories.

...read moreread less

Abstract: The basic motivation for employing trajectory models for speech recognition is that sequences of speech features are statistically dependent and that the effective and efficient modeling of the speech process will incorporate this dependency. In our previous work we presented an approach to modeling the speech process with trajectories. In this paper we continue our development of parametric trajectory models for speech recognition. We extend our models to include time-varying covariances and describe our approach for defining a metric between speech segments based on trajectory models; it is important in developing mixture models of trajectories.

...read moreread less

100 citations

Proceedings Article•

On the decorrelation of filter-bank energies in speech recognition

[...]

Climent Nadeu¹, Javier Hernando, Mónica Gorricho•Institutions (1)

Polytechnic University of Catalonia¹

01 Jan 1995

TL;DR: A new representation is proposed that significantly outperforms both mel-cepstrum and LPC-cePstrum techniques in both recognition rate and computational cost and consists of filtering the frequency sequence of filter-bank energies with an extremely simple filter that equalizes the variance of the cepstral coefficients.

...read moreread less

Abstract: Cepstral coefficients are widely used in speech recognition. In this paper, we claim that they are not the best way of representing the spectral envelope, at least for some usual speech recognition systems. In fact, cepstrum has several disadvantages: poor physical meaning, need of transformation, and low capacity of adaptation to some recognition systems. In this paper, we propose a new representation that significantly outperforms both mel-cepstrum and LPC-cepstrum techniques in both recognition rate and computational cost. It consists of filtering the frequency sequence of filter-bank energies with an extremely simple filter that equalizes the variance of the cepstral coefficients. Excellent results of the new technique using a continuous observation density HMM recognition system and two very different recognition tasks, connected digits and phone recognition, are presented.

...read moreread less

100 citations

Collapse

Network Information

Performance

Metrics

15,632

Papers

337,766

Citations

No. of papers in the topic in previous years
Year	Papers
2023	165
2022	468
2021	283
2020	475
2019	484
2018	420

Speaker recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics