Duration mismatch compensation for i-vector based speaker recognition systems
Taufiq Hasan,Rahim Saeidi,John H. L. Hansen,David A. van Leeuwen +3 more
- pp 7663-7667
TLDR
The effect of duration variability on phoneme distributions of speech utterances and i-vector length is analyzed and it is demonstrated that, as utterance duration is decreased, number of detected unique phonemes andi- vector length approaches zero in a logarithmic and non-linear fashion.Abstract:
Speaker recognition systems trained on long duration utterances are known to perform significantly worse when short test segments are encountered. To address this mismatch, we analyze the effect of duration variability on phoneme distributions of speech utterances and i-vector length. We demonstrate that, as utterance duration is decreased, number of detected unique phonemes and i-vector length approaches zero in a logarithmic and non-linear fashion, respectively. Assuming duration variability as an additive noise in the i-vector space, we propose three different strategies for its compensation: i) multi-duration training in Probabilistic Linear Discriminant Analysis (PLDA) model, ii) score calibration using log duration as a Quality Measure Function (QMF), and iii) multi-duration PLDA training with synthesized short duration i-vectors. Experiments are designed based on the 2012 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE) protocol with varying test utterance duration. Experimental results demonstrate the effectiveness of the proposed schemes on short duration test conditions, especially with the QMF calibration approach.read more
Citations
More filters
Journal ArticleDOI
Speaker Recognition by Machines and Humans: A tutorial review
John H. L. Hansen,Taufiq Hasan +1 more
TL;DR: A comparative study of human versus machine speaker recognition is concluded, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems.
Proceedings ArticleDOI
Deep neural network-based speaker embeddings for end-to-end speaker verification
David Snyder,Pegah Ghahremani,Daniel Povey,Daniel Garcia-Romero,Yishay Carmiel,Sanjeev Khudanpur +5 more
TL;DR: It is shown that given a large number of training speakers, the proposed system outperforms an i-vector baseline in equal error-rate (EER) and at low miss rates.
Journal ArticleDOI
Text-dependent speaker verification: Classifiers, databases and RSR2015
TL;DR: The HiLAM system, based on a three layer acoustic architecture, and an i-vector/PLDA system, outperforms the state-of-the-art i- vector system in most of the scenarios and provides a reference evaluation scheme and a reference performance on RSR2015 database to the research community.
Proceedings ArticleDOI
End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances.
Chunlei Zhang,Kazuhito Koishida +1 more
TL;DR: An end-to-end system which directly learns a mapping from speech features to a compact fixed length speaker discriminative embedding where the Euclidean distance is employed for measuring similarity within trials.
Journal ArticleDOI
Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings
TL;DR: A novel text-independent speaker verification framework based on the triplet loss and a very deep convolutional neural network architecture are investigated in this study, where a fixed-length speaker discriminative embedding is learned from sparse speech features and utilized as a feature representation for the SV tasks.
References
More filters
Journal ArticleDOI
Speaker Verification Using Adapted Gaussian Mixture Models
TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.
Journal ArticleDOI
Front-End Factor Analysis for Speaker Verification
TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Journal ArticleDOI
Noise power spectral density estimation based on optimal smoothing and minimum statistics
TL;DR: An unbiased noise estimator is developed which derives the optimal smoothing parameter for recursive smoothing of the power spectral density of the noisy speech signal by minimizing a conditional mean square estimation error criterion in each time step.
Proceedings Article
Analysis of i-vector Length Normalization in Speaker Recognition Systems.
TL;DR: The proposed approach deals with the nonGaussian behavior of i-vectors by performing a simple length normalization, which allows the use of probabilistic models with Gaussian assumptions that yield equivalent performance to that of more complicated systems based on Heavy-Tailed assumptions.
Journal ArticleDOI
Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging
TL;DR: In this article, an improved minima controlled recursive averaging (IMCRA) approach is proposed for noise estimation in adverse environments involving nonstationary noise, weak speech components, and low input signal-to-noise ratio (SNR).