scispace - formally typeset
Open AccessProceedings ArticleDOI

Duration mismatch compensation for i-vector based speaker recognition systems

TLDR
The effect of duration variability on phoneme distributions of speech utterances and i-vector length is analyzed and it is demonstrated that, as utterance duration is decreased, number of detected unique phonemes andi- vector length approaches zero in a logarithmic and non-linear fashion.
Abstract
Speaker recognition systems trained on long duration utterances are known to perform significantly worse when short test segments are encountered. To address this mismatch, we analyze the effect of duration variability on phoneme distributions of speech utterances and i-vector length. We demonstrate that, as utterance duration is decreased, number of detected unique phonemes and i-vector length approaches zero in a logarithmic and non-linear fashion, respectively. Assuming duration variability as an additive noise in the i-vector space, we propose three different strategies for its compensation: i) multi-duration training in Probabilistic Linear Discriminant Analysis (PLDA) model, ii) score calibration using log duration as a Quality Measure Function (QMF), and iii) multi-duration PLDA training with synthesized short duration i-vectors. Experiments are designed based on the 2012 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE) protocol with varying test utterance duration. Experimental results demonstrate the effectiveness of the proposed schemes on short duration test conditions, especially with the QMF calibration approach.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Speaker Recognition by Machines and Humans: A tutorial review

TL;DR: A comparative study of human versus machine speaker recognition is concluded, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems.
Proceedings ArticleDOI

Deep neural network-based speaker embeddings for end-to-end speaker verification

TL;DR: It is shown that given a large number of training speakers, the proposed system outperforms an i-vector baseline in equal error-rate (EER) and at low miss rates.
Journal ArticleDOI

Text-dependent speaker verification: Classifiers, databases and RSR2015

TL;DR: The HiLAM system, based on a three layer acoustic architecture, and an i-vector/PLDA system, outperforms the state-of-the-art i- vector system in most of the scenarios and provides a reference evaluation scheme and a reference performance on RSR2015 database to the research community.
Proceedings ArticleDOI

End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances.

TL;DR: An end-to-end system which directly learns a mapping from speech features to a compact fixed length speaker discriminative embedding where the Euclidean distance is employed for measuring similarity within trials.
Journal ArticleDOI

Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings

TL;DR: A novel text-independent speaker verification framework based on the triplet loss and a very deep convolutional neural network architecture are investigated in this study, where a fixed-length speaker discriminative embedding is learned from sparse speech features and utilized as a feature representation for the SV tasks.
References
More filters
Journal ArticleDOI

Speaker Verification Using Adapted Gaussian Mixture Models

TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.
Journal ArticleDOI

Front-End Factor Analysis for Speaker Verification

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Journal ArticleDOI

Noise power spectral density estimation based on optimal smoothing and minimum statistics

TL;DR: An unbiased noise estimator is developed which derives the optimal smoothing parameter for recursive smoothing of the power spectral density of the noisy speech signal by minimizing a conditional mean square estimation error criterion in each time step.
Proceedings Article

Analysis of i-vector Length Normalization in Speaker Recognition Systems.

TL;DR: The proposed approach deals with the nonGaussian behavior of i-vectors by performing a simple length normalization, which allows the use of probabilistic models with Gaussian assumptions that yield equivalent performance to that of more complicated systems based on Heavy-Tailed assumptions.
Journal ArticleDOI

Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging

TL;DR: In this article, an improved minima controlled recursive averaging (IMCRA) approach is proposed for noise estimation in adverse environments involving nonstationary noise, weak speech components, and low input signal-to-noise ratio (SNR).
Related Papers (5)