scispace - formally typeset
Search or ask a question

MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research

TL;DR: The MSR Identity Toolbox is released, which contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition, and provides many of the functionalities available in other open-source speaker recognition toolkits.
Abstract: We are happy to announce the release of the MSR Identity Toolbox: A MATLAB toolbox for speaker-recognition research. This toolbox contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition. It provides researchers with a test bed for developing new front-end and back-end techniques, allowing replicable evaluation of new advancements. It will also help newcomers in the field by lowering the "barrier to entry," enabling them to quickly build baseline systems for their experiments. Although the focus of this toolbox is on speaker recognition, it can also be used for other speech related applications such as language, dialect, and accent identification. Additionally, it provides many of the functionalities available in other open-source speaker recognition toolkits (e.g., ALIZE

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A review of postevaluation studies conducted using the same dataset illustrates the rapid progress stemming from ASVspoof and outlines the need for further investigation.
Abstract: Concerns regarding the vulnerability of automatic speaker verification (ASV) technology against spoofing can undermine confidence in its reliability and form a barrier to exploitation. The absence of competitive evaluations and the lack of common datasets has hampered progress in developing effective spoofing countermeasures. This paper describes the ASV Spoofing and Countermeasures (ASVspoof) initiative, which aims to fill this void. Through the provision of a common dataset, protocols, and metrics, ASVspoof promotes a sound research methodology and fosters technological progress. This paper also describes the ASVspoof 2015 dataset, evaluation, and results with detailed analyses. A review of postevaluation studies conducted using the same dataset illustrates the rapid progress stemming from ASVspoof and outlines the need for further investigation. Priority future research directions are presented in the scope of the next ASVspoof evaluation planned for 2017.

177 citations


Cites methods from "MSR Identity Toolbox v1.0: A MATLAB..."

  • ...The system was implemented using the Microsoft Research (MSR) Identity Toolbox [32]....

    [...]

Proceedings ArticleDOI
20 Aug 2017
TL;DR: This paper addresses a replay spoofing attack against a speaker recognition system by detecting that the analysed signal has passed through multiple analogue-to-digital conversions by modelling the subband spectrum and using the proposed features derived from the linear prediction analysis.
Abstract: This paper presents our contribution to the ASVspoof 2017 Challenge. It addresses a replay spoofing attack against a speaker recognition system by detecting that the analysed signal has passed through multiple analogue-to-digital (AD) conversions. Specifically, we show that most of the cues that enable to detect the replay attacks can be found in the high-frequency band of the replayed recordings. The described anti-spoofing countermeasures are based on (1) modelling the subband spectrum and (2) using the proposed features derived from the linear prediction (LP) analysis. The results of the investigated methods show a significant improvement in comparison to the baseline system of the ASVspoof 2017 Challenge. A relative equal error rate (EER) reduction by 70% was achieved for the development set and a reduction by 30% was obtained for the evaluation set.

140 citations


Cites methods from "MSR Identity Toolbox v1.0: A MATLAB..."

  • ...The MSR Identity Toolbox [25] implementation of the EM GMM training and scoring was used in this research....

    [...]

Journal ArticleDOI
TL;DR: This work approaches the problem of speaker recognition from severely degraded audio data by judiciously combining two commonly used features: Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC), and concludes that MFCC and LPC capture two distinct aspects of speech, viz., speech perception and speech production.
Abstract: Speaker recognition algorithms are negatively impacted by the quality of the input speech signal. In this work, we approach the problem of speaker recognition from severely degraded audio data by judiciously combining two commonly used features: Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC). Our hypothesis rests on the observation that MFCC and LPC capture two distinct aspects of speech, viz., speech perception and speech production. A carefully crafted 1D Triplet Convolutional Neural Network (1D-Triplet-CNN) is used to combine these two features in a novel manner, thereby enhancing the performance of speaker recognition in challenging scenarios. Extensive evaluation on multiple datasets, different types of audio degradations, multi-lingual speech, varying length of audio samples, etc. convey the efficacy of the proposed approach over existing speaker recognition methods, including those based on iVector and xVector.

104 citations


Cites methods from "MSR Identity Toolbox v1.0: A MATLAB..."

  • ...2) iVector-PLDA [20] Based Speaker Verification Experiments: To obtain a second baseline performance on the experiments laid out in Tables I, II, III and IV, we perform iVector-PLDA based speaker recognition experiments using the implementation in the MSR identity toolkit [45]....

    [...]

  • ...implementation of xVector algorithm was used together with the gaussian PLDA implementation given in the MSR identity toolkit [45] for performing the xVector-PLDA based speaker recognition experiments....

    [...]

Journal ArticleDOI
TL;DR: This paper starts with a thorough analysis of the spoofing effects of five speech synthesis and eight voice conversion systems, and the vulnerability of three speaker verification systems under those attacks, and introduces a number of countermeasures to prevent spoofing attacks.
Abstract: In this paper, we present a systematic study of the vulnerability of automatic speaker verification to a diverse range of spoofing attacks. We start with a thorough analysis of the spoofing effects of five speech synthesis and eight voice conversion systems, and the vulnerability of three speaker verification systems under those attacks. We then introduce a number of countermeasures to prevent spoofing attacks from both known and unknown attackers. Known attackers are spoofing systems whose output was used to train the countermeasures, while an unknown attacker is a spoofing system whose output was not available to the countermeasures during training. Finally, we benchmark automatic systems against human performance on both speaker verification and spoofing detection tasks.

97 citations


Cites methods from "MSR Identity Toolbox v1.0: A MATLAB..."

  • ...We used three WSJ databases (WSJ0, WSJ1, and WSJCAM) and the Resource Management database (RM1) for training the UBM, eigenspaces, and LDA....

    [...]

Proceedings ArticleDOI
26 May 2013
TL;DR: This study presents systems submitted by the Center for Robust Speech Systems from UTDallas to NIST SRE 2018, and investigates three alternative front-end speaker embedding frameworks, finding them to be both complementary and effective in achieving overall improved speaker recognition performance.
Abstract: In this study, we present systems submitted by the Center for Robust Speech Systems (CRSS) from UTDallas to NIST SRE 2018 (SRE18). Three alternative front-end speaker embedding frameworks are investigated, that includes: (i) i-vector, (ii) x-vector, (iii) and a modified triplet speaker embedding system (t-vector). Similar to the previous SRE, language mismatch between training and enrollment/test data, the so-called domain mismatch, remains as a major challenge in this evaluation. In addition, SRE18 also introduces a small portion of audio from an unstructured video corpus in which speaker detection/diarization is supposedly needed to be effectively integrated into speaker recognition for system robustness. In our system development, we focused on: (i) building novel deep neural network based speaker discriminative embedding systems as utterance level feature representations, (ii) exploring alternative dimension reduction methods, back-end classifiers, score normalization techniques which can incorporate unlabeled in-domain data for domain adaptation, (iii) finding an improved data set configurations for the speaker embedding network, LDA/PLDA, and score calibration training (v) and finally, investigating effective score calibration and fusion strategies. The final resulting systems are shown to be both complementary and effective in achieving overall improved speaker recognition performance.

79 citations


Cites methods from "MSR Identity Toolbox v1.0: A MATLAB..."

  • ...The MSR-Identity toolkit is adopted for the back-end implementation [20]....

    [...]

References
More filters
Book
01 Jan 1972
TL;DR: This completely revised second edition presents an introduction to statistical pattern recognition, which is appropriate as a text for introductory courses in pattern recognition and as a reference book for workers in the field.
Abstract: This completely revised second edition presents an introduction to statistical pattern recognition Pattern recognition in general covers a wide range of problems: it is applied to engineering problems, such as character readers and wave form analysis as well as to brain modeling in biology and psychology Statistical decision and estimation, which are the main subjects of this book, are regarded as fundamental to the study of pattern recognition This book is appropriate as a text for introductory courses in pattern recognition and as a reference book for workers in the field Each chapter contains computer projects as well as exercises

10,526 citations

Journal ArticleDOI
TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.

4,673 citations


Additional excerpts

  • ...0: A MATLAB Toolbox for Speaker...

    [...]

Book
01 Sep 1990

4,384 citations


"MSR Identity Toolbox v1.0: A MATLAB..." refers methods in this paper

  • ...The dimensionality of the i-vectors are normally reduced through linear discriminant analysis (with Fisher criterion [9]) to annihilate the non-speaker related directions (e....

    [...]

  • ...• Sufficient statistics computation for observations given the GMM (compute_bw_stats) [4, 12] • Total variability subspace learning using EM (train_tv_space) [4, 12, 13] • i-vector extraction (extract_ivector) [4, 12, 13] • Linear discriminant analysis (lda) [9] • i-vector length normalization, centering, whitening, and Gaussian probabilistic LDA using EM (gplda-em) [10, 11, 14] • PLDA-based verification trial scoring (score_gplda_trials) [11, 14]...

    [...]

Journal ArticleDOI
TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Abstract: This paper presents an extension of our previous work which proposes a new speaker representation for speaker verification. In this modeling, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis. This space is named the total variability space because it models both speaker and channel variabilities. Two speaker verification systems are proposed which use this new representation. The first system is a support vector machine-based system that uses the cosine kernel to estimate the similarity between the input data. The second system directly uses the cosine similarity as the final decision score. We tested three channel compensation techniques in the total variability space, which are within-class covariance normalization (WCCN), linear discriminate analysis (LDA), and nuisance attribute projection (NAP). We found that the best results are obtained when LDA is followed by WCCN. We achieved an equal error rate (EER) of 1.12% and MinDCF of 0.0094 using the cosine distance scoring on the male English trials of the core condition of the NIST 2008 Speaker Recognition Evaluation dataset. We also obtained 4% absolute EER improvement for both-gender trials on the 10 s-10 s condition compared to the classical joint factor analysis scoring.

3,526 citations


Additional excerpts

  • ...0: A MATLAB Toolbox for Speaker...

    [...]

Proceedings ArticleDOI
26 Dec 2007
TL;DR: This paper describes face data as resulting from a generative model which incorporates both within- individual and between-individual variation, and calculates the likelihood that the differences between face images are entirely due to within-individual variability.
Abstract: Many current face recognition algorithms perform badly when the lighting or pose of the probe and gallery images differ. In this paper we present a novel algorithm designed for these conditions. We describe face data as resulting from a generative model which incorporates both within-individual and between-individual variation. In recognition we calculate the likelihood that the differences between face images are entirely due to within-individual variability. We extend this to the non-linear case where an arbitrary face manifold can be described and noise is position-dependent. We also develop a "tied" version of the algorithm that allows explicit comparison across quite different viewing conditions. We demonstrate that our model produces state of the art results for (i) frontal face recognition (ii) face recognition under varying pose.

1,099 citations