Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Representation Learning for Speech Emotion Recognition

[...]

Sayan Ghosh¹, Eugene Laksana¹, Louis-Philippe Morency², Stefan Scherer¹•Institutions (2)

University of Southern California¹, Carnegie Mellon University²

08 Sep 2016

TL;DR: In this paper, the authors investigated emotion recognition from spectrogram features extracted from the speech and glottal flow signals; spectrogram encoding is performed by a stacked autoencoder and an RNN (Recurrent Neural Network) is used for classification of four primary emotions.

...read moreread less

Abstract: Speech emotion recognition is an important problem with applications as varied as human-computer interfaces and affective computing. Previous approaches to emotion recognition have mostly focused on extraction of carefully engineered features and have trained simple classifiers for the emotion task. There has been limited effort at representation learning for affect recognition, where features are learnt directly from the signal waveform or spectrum. Prior work also does not investigate the effect of transfer learning from affective attributes such as valence and activation to categorical emotions. In this paper, we investigate emotion recognition from spectrogram features extracted from the speech and glottal flow signals; spectrogram encoding is performed by a stacked autoencoder and an RNN (Recurrent Neural Network) is used for classification of four primary emotions. We perform two experiments to improve RNN training : (1) Representation Learning Model training on the glottal flow signal to investigate the effect of speaker and phonetic invariant features on classification performance (2) Transfer Learning RNN training on valence and activation, which is adapted to a four emotion classification task. On the USC-IEMOCAP dataset, our proposed approach achieves a performance comparable to the state of the art speech emotion recognition systems.

...read moreread less

116 citations

Proceedings Article•DOI•

ALIZE 3.0-Open Source Toolkit for State-of-the-Art Speaker Recognition

[...]

Anthony Larcher, Jean-François Bonastre, Benoit Fauve¹, Kong Aik Lee², Christophe Lévy, Haizhou Li², John Mason¹, Jean-Yves Parfait - Show less +4 more•Institutions (2)

Swansea University¹, Institute for Infocomm Research Singapore²

25 Aug 2013

TL;DR: The latest version of the corpus and performance on the NIST-SRE 2010 extended task is presented and the toolkit includes a set of high level tools dedicated to speaker recognition based on the latest developments in speaker recognition.

...read moreread less

Abstract: ALIZE is an open-source platform for speaker recognition. The ALIZE library implements a low-level statistical engine based on the well-known Gaussian mixture modelling. The toolkit includes a set of high level tools dedicated to speaker recognition based on the latest developments in speaker recognition such as Joint Factor Analysis, Support Vector Machine, i-vector modelling and Probabilistic Linear Discriminant Analysis. Since 2005, the performance of ALIZE has been demonstrated in series of Speaker Recognition Evaluations (SREs) conducted by NIST and has been used by many participants in the last NIST-SRE 2012. This paper presents the latest version of the corpus and performance on the NIST-SRE 2010 extended task.

...read moreread less

115 citations

Journal Article•DOI•

Learning words from reliable and unreliable speakers.

[...]

Jason Scofield¹, Douglas A. Behrend²•Institutions (2)

University of Alabama¹, University of Arkansas²

01 Apr 2008-Cognitive Development

TL;DR: This paper examined whether 3-and 4-year olds would trust a reliable speaker over an unreliable speaker when learning a new word and whether that trust would be reversed, and the word mapping revised, when a trusted speaker later proved unreliable.

...read moreread less

115 citations

Journal Article•DOI•

An Information Theoretic Approach to Speaker Diarization of Meeting Data

[...]

Deepu Vijayasenan¹, Fabio Valente¹, Hervé Bourlard¹•Institutions (1)

Idiap Research Institute¹

01 Sep 2009-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: Issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved by the diarized system, and the algorithms for optimizing the objective function are discussed.

...read moreread less

Abstract: A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen-Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore, we benchmark the proposed system against a state-of-the-art system on the NIST RT06 (rich transcription) data set for speaker diarization of meetings. The IB-based system achieves a diarization error rate of 23.2% compared to 23.6% for the baseline system. This approach being mainly based on nonparametric clustering, it runs significantly faster than the baseline HMM/GMM based system, resulting in faster-than-real-time diarization.

...read moreread less

115 citations

Proceedings Article•DOI•

Segmentation of speech using speaker identification

[...]

L.D. Wilcox¹, Francine Chen¹, Don Kimber¹, V. Balasubramanian¹•Institutions (1)

PARC¹

19 Apr 1994

TL;DR: This paper describes techniques for segmentation of conversational speech based on speaker identity using Viterbi decoding on a hidden Markov model network consisting of interconnected speaker sub-networks.

...read moreread less

Abstract: This paper describes techniques for segmentation of conversational speech based on speaker identity. Speaker segmentation is performed using Viterbi decoding on a hidden Markov model network consisting of interconnected speaker sub-networks. Speaker sub-networks are initialized using Baum-Welch training on data labeled by speaker, and are iteratively retrained based on the previous segmentation. If data labeled by speaker is not available, agglomerative clustering is used to approximately segment the conversational speech according to speaker prior to Baum-Welch training. The distance measure for the clustering is a likelihood ratio in which speakers are modeled by Gaussian distributions. The distance between merged segments is recomputed at each stage of the clustering, and a duration model is used to bias the likelihood ratio. Segmentation accuracy using agglomerative clustering initialization matches accuracy using initialization with speaker labeled data. >

...read moreread less

115 citations

Collapse

Network Information

Performance

Metrics

15,632

Papers

337,766

Citations

No. of papers in the topic in previous years
Year	Papers
2023	165
2022	468
2021	283
2020	475
2019	484
2018	420

Speaker recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics