Open Access
Cosine Similarity Scoring without Score Normalization Techniques.
Reads0
Chats0
TLDR
This paper introduces a modification to the cosine similarity that does not require explicit score normalization, relying instead on simple mean and covariance statistics from a collection of impostor speaker ivectors to enable application of a new unsupervised speaker adaptation technique to models defined in the ivector space.Abstract:
In recent work [1], a simplified and highly effective approach to speaker recognition based on the cosine similarity between lowdimensional vectors, termed ivectors, defined in a total variability space was introduced. The total variability space representation is motivated by the popular Joint Factor Analysis (JFA) approach, but does not require the complication of estimating separate speaker and channel spaces and has been shown to be less dependent on score normalization procedures, such as znorm and t-norm. In this paper, we introduce a modification to the cosine similarity that does not require explicit score normalization, relying instead on simple mean and covariance statistics from a collection of impostor speaker ivectors. By avoiding the complication of zand t-norm, the new approach further allows for application of a new unsupervised speaker adaptation technique to models defined in the ivector space. Experiments are conducted on the core condition of the NIST 2008 corpora, where, with adaptation, the new approach produces an equal error rate (EER) of 4.8% and min decision cost function (MinDCF) of 2.3% on all female speaker trials.read more
Citations
More filters
Journal ArticleDOI
Text-dependent speaker verification: Classifiers, databases and RSR2015
TL;DR: The HiLAM system, based on a three layer acoustic architecture, and an i-vector/PLDA system, outperforms the state-of-the-art i- vector system in most of the scenarios and provides a reference evaluation scheme and a reference performance on RSR2015 database to the research community.
Proceedings ArticleDOI
i-vector based speaker recognition on short utterances
TL;DR: In this paper, a comparison of Joint Factor Analysis (JFA) and i-vector based systems including various compensation techniques; Within-Class Covariance Normalization (WCCN), LDA, Scatter Difference Nuisance Attribute Projection (SDNAP) and Gaussian Probabilistic Linear Discriminant Analysis (GPLDA) is presented.
Proceedings ArticleDOI
PLDA for speaker verification with utterances of arbitrary duration
TL;DR: This paper shows how to quantify the uncertainty associated with the i-vector extraction process and propagate it into a PLDA classifier and finds that it led to substantial improvements in accuracy.
Journal ArticleDOI
A Study of the Cosine Distance-Based Mean Shift for Telephone Speech Diarization
TL;DR: A simple iterative Mean Shift algorithm based on the cosine distance to perform speaker clustering under speaker diarization conditions and state of the art results as measured by the Diarization Error Rate and the Number of Detected Speakers on the LDC CallHome telephone corpus are reported.
Journal ArticleDOI
Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings
TL;DR: A novel text-independent speaker verification framework based on the triplet loss and a very deep convolutional neural network architecture are investigated in this study, where a fixed-length speaker discriminative embedding is learned from sparse speech features and utilized as a feature representation for the SV tasks.
References
More filters
Journal ArticleDOI
Front-End Factor Analysis for Speaker Verification
TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Journal ArticleDOI
A Study of Interspeaker Variability in Speaker Verification
TL;DR: It is shown that when a large joint factor analysis model is trained in this way and tested on the core condition, the extended data condition and the cross-channel condition, it is capable of performing at least as well as fusions of multiple systems of other types.
Feature Warping for Robust Speaker Verification
TL;DR: In this paper, the authors proposed a target mapping method that warps the distribution of a cepstral feature stream to a standardised distribution over a specified time interval, which is robust to channel mismatch, additive noise and to some extent, non-linear effects attributed to transducers.
Proceedings Article
Within-class covariance normalization for SVM-based speaker recognition.
TL;DR: A practical procedure for applying WCCN to an SVM-based speaker recognition system where the input feature vectors reside in a high-dimensional space and achieves improvements of up to 22% in EER and 28% in minimum decision cost function (DCF) over the previous baseline.
Proceedings Article
Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification
TL;DR: A new speaker verification system architecture based on Joint Factor Analysis (JFA) as feature extractor is presented, using the use of the cosine kernel in the new total factor space to design two different systems: the first system is Support Vector Machines based, and the second one uses directly this kernel as a decision score.