scispace - formally typeset
Open Access

Cosine Similarity Scoring without Score Normalization Techniques.

Reads0
Chats0
TLDR
This paper introduces a modification to the cosine similarity that does not require explicit score normalization, relying instead on simple mean and covariance statistics from a collection of impostor speaker ivectors to enable application of a new unsupervised speaker adaptation technique to models defined in the ivector space.
Abstract
In recent work [1], a simplified and highly effective approach to speaker recognition based on the cosine similarity between lowdimensional vectors, termed ivectors, defined in a total variability space was introduced. The total variability space representation is motivated by the popular Joint Factor Analysis (JFA) approach, but does not require the complication of estimating separate speaker and channel spaces and has been shown to be less dependent on score normalization procedures, such as znorm and t-norm. In this paper, we introduce a modification to the cosine similarity that does not require explicit score normalization, relying instead on simple mean and covariance statistics from a collection of impostor speaker ivectors. By avoiding the complication of zand t-norm, the new approach further allows for application of a new unsupervised speaker adaptation technique to models defined in the ivector space. Experiments are conducted on the core condition of the NIST 2008 corpora, where, with adaptation, the new approach produces an equal error rate (EER) of 4.8% and min decision cost function (MinDCF) of 2.3% on all female speaker trials.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Text-dependent speaker verification: Classifiers, databases and RSR2015

TL;DR: The HiLAM system, based on a three layer acoustic architecture, and an i-vector/PLDA system, outperforms the state-of-the-art i- vector system in most of the scenarios and provides a reference evaluation scheme and a reference performance on RSR2015 database to the research community.
Proceedings ArticleDOI

i-vector based speaker recognition on short utterances

TL;DR: In this paper, a comparison of Joint Factor Analysis (JFA) and i-vector based systems including various compensation techniques; Within-Class Covariance Normalization (WCCN), LDA, Scatter Difference Nuisance Attribute Projection (SDNAP) and Gaussian Probabilistic Linear Discriminant Analysis (GPLDA) is presented.
Proceedings ArticleDOI

PLDA for speaker verification with utterances of arbitrary duration

TL;DR: This paper shows how to quantify the uncertainty associated with the i-vector extraction process and propagate it into a PLDA classifier and finds that it led to substantial improvements in accuracy.
Journal ArticleDOI

A Study of the Cosine Distance-Based Mean Shift for Telephone Speech Diarization

TL;DR: A simple iterative Mean Shift algorithm based on the cosine distance to perform speaker clustering under speaker diarization conditions and state of the art results as measured by the Diarization Error Rate and the Number of Detected Speakers on the LDC CallHome telephone corpus are reported.
Journal ArticleDOI

Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings

TL;DR: A novel text-independent speaker verification framework based on the triplet loss and a very deep convolutional neural network architecture are investigated in this study, where a fixed-length speaker discriminative embedding is learned from sparse speech features and utilized as a feature representation for the SV tasks.
References
More filters
Journal ArticleDOI

Front-End Factor Analysis for Speaker Verification

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Journal ArticleDOI

A Study of Interspeaker Variability in Speaker Verification

TL;DR: It is shown that when a large joint factor analysis model is trained in this way and tested on the core condition, the extended data condition and the cross-channel condition, it is capable of performing at least as well as fusions of multiple systems of other types.

Feature Warping for Robust Speaker Verification

TL;DR: In this paper, the authors proposed a target mapping method that warps the distribution of a cepstral feature stream to a standardised distribution over a specified time interval, which is robust to channel mismatch, additive noise and to some extent, non-linear effects attributed to transducers.
Proceedings Article

Within-class covariance normalization for SVM-based speaker recognition.

TL;DR: A practical procedure for applying WCCN to an SVM-based speaker recognition system where the input feature vectors reside in a high-dimensional space and achieves improvements of up to 22% in EER and 28% in minimum decision cost function (DCF) over the previous baseline.
Proceedings Article

Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification

TL;DR: A new speaker verification system architecture based on Joint Factor Analysis (JFA) as feature extractor is presented, using the use of the cosine kernel in the new total factor space to design two different systems: the first system is Support Vector Machines based, and the second one uses directly this kernel as a decision score.
Related Papers (5)