MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research

Home
/
Papers
/
MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research

MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research

Seyed Omid Sadjadi, Malcolm Slaney, Larry Heck

01 Sep 2013-

TL;DR: The MSR Identity Toolbox is released, which contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition, and provides many of the functionalities available in other open-source speaker recognition toolkits.

read less

Abstract: We are happy to announce the release of the MSR Identity Toolbox: A MATLAB toolbox for speaker-recognition research. This toolbox contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition. It provides researchers with a test bed for developing new front-end and back-end techniques, allowing replicable evaluation of new advancements. It will also help newcomers in the field by lowering the "barrier to entry," enabling them to quickly build baseline systems for their experiments. Although the focus of this toolbox is on speaker recognition, it can also be used for other speech related applications such as language, dialect, and accent identification. Additionally, it provides many of the functionalities available in other open-source speaker recognition toolkits (e.g., ALIZE

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge

[...]

Zhizheng Wu¹, Junichi Yamagishi¹, Tomi Kinnunen², Cemal Hanilci², Mohammed Sahidullah², Aleksandr Sizov², Nicholas Evans³, Massimiliano Todisco³ - Show less +4 more•Institutions (3)

University of Edinburgh¹, University of Eastern Finland², Institut Eurécom³

17 Feb 2017-IEEE Journal of Selected Topics in Signal Processing

TL;DR: A review of postevaluation studies conducted using the same dataset illustrates the rapid progress stemming from ASVspoof and outlines the need for further investigation.

...read moreread less

Abstract: Concerns regarding the vulnerability of automatic speaker verification (ASV) technology against spoofing can undermine confidence in its reliability and form a barrier to exploitation. The absence of competitive evaluations and the lack of common datasets has hampered progress in developing effective spoofing countermeasures. This paper describes the ASV Spoofing and Countermeasures (ASVspoof) initiative, which aims to fill this void. Through the provision of a common dataset, protocols, and metrics, ASVspoof promotes a sound research methodology and fosters technological progress. This paper also describes the ASVspoof 2015 dataset, evaluation, and results with detailed analyses. A review of postevaluation studies conducted using the same dataset illustrates the rapid progress stemming from ASVspoof and outlines the need for further investigation. Priority future research directions are presented in the scope of the next ASVspoof evaluation planned for 2017.

...read moreread less

177 citations

Cites methods from "MSR Identity Toolbox v1.0: A MATLAB..."

...The system was implemented using the Microsoft Research (MSR) Identity Toolbox [32]....
[...]

Proceedings Article•DOI•

Audio Replay Attack Detection Using High-Frequency Features.

[...]

Marcin Witkowski¹, Stanisław Kacprzak¹, Piotr Żelasko¹, Konrad Kowalczyk¹, Jakub Galka¹ - Show less +1 more•Institutions (1)

AGH University of Science and Technology¹

20 Aug 2017

TL;DR: This paper addresses a replay spoofing attack against a speaker recognition system by detecting that the analysed signal has passed through multiple analogue-to-digital conversions by modelling the subband spectrum and using the proposed features derived from the linear prediction analysis.

...read moreread less

Abstract: This paper presents our contribution to the ASVspoof 2017 Challenge. It addresses a replay spoofing attack against a speaker recognition system by detecting that the analysed signal has passed through multiple analogue-to-digital (AD) conversions. Specifically, we show that most of the cues that enable to detect the replay attacks can be found in the high-frequency band of the replayed recordings. The described anti-spoofing countermeasures are based on (1) modelling the subband spectrum and (2) using the proposed features derived from the linear prediction (LP) analysis. The results of the investigated methods show a significant improvement in comparison to the baseline system of the ASVspoof 2017 Challenge. A relative equal error rate (EER) reduction by 70% was achieved for the development set and a reduction by 30% was obtained for the evaluation set.

...read moreread less

140 citations

Cites methods from "MSR Identity Toolbox v1.0: A MATLAB..."

...The MSR Identity Toolbox [25] implementation of the EM GMM training and scoring was used in this research....
[...]

Journal Article•DOI•

Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals

[...]

Anurag Chowdhury¹, Arun Ross¹•Institutions (1)

Michigan State University¹

01 Jan 2020-IEEE Transactions on Information Forensics and Security

TL;DR: This work approaches the problem of speaker recognition from severely degraded audio data by judiciously combining two commonly used features: Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC), and concludes that MFCC and LPC capture two distinct aspects of speech, viz., speech perception and speech production.

...read moreread less

Abstract: Speaker recognition algorithms are negatively impacted by the quality of the input speech signal. In this work, we approach the problem of speaker recognition from severely degraded audio data by judiciously combining two commonly used features: Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC). Our hypothesis rests on the observation that MFCC and LPC capture two distinct aspects of speech, viz., speech perception and speech production. A carefully crafted 1D Triplet Convolutional Neural Network (1D-Triplet-CNN) is used to combine these two features in a novel manner, thereby enhancing the performance of speaker recognition in challenging scenarios. Extensive evaluation on multiple datasets, different types of audio degradations, multi-lingual speech, varying length of audio samples, etc. convey the efficacy of the proposed approach over existing speaker recognition methods, including those based on iVector and xVector.

...read moreread less

104 citations

Cites methods from "MSR Identity Toolbox v1.0: A MATLAB..."

...2) iVector-PLDA [20] Based Speaker Verification Experiments: To obtain a second baseline performance on the experiments laid out in Tables I, II, III and IV, we perform iVector-PLDA based speaker recognition experiments using the implementation in the MSR identity toolkit [45]....
[...]
...implementation of xVector algorithm was used together with the gaussian PLDA implementation given in the MSR identity toolkit [45] for performing the xVector-PLDA based speaker recognition experiments....
[...]

Journal Article•DOI•

Anti-spoofing for text-independent speaker verification: an initial database, comparison of countermeasures, and human performance

[...]

Zhizheng Wu¹, Phillip L. De Leon², Cenk Demiroglu³, Ali Khodabakhsh³, Simon King¹, Zhen-Hua Ling⁴, Daisuke Saito⁵, Bryan Stewart², Tomoki Toda⁶, Mirjam Wester¹, Junichi Yamagishi¹ - Show less +7 more•Institutions (6)

University of Edinburgh¹, New Mexico State University², Özyeğin University³, University of Science and Technology of China⁴, University of Tokyo⁵, Nagoya University⁶

01 Apr 2016-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper starts with a thorough analysis of the spoofing effects of five speech synthesis and eight voice conversion systems, and the vulnerability of three speaker verification systems under those attacks, and introduces a number of countermeasures to prevent spoofing attacks.

...read moreread less

Abstract: In this paper, we present a systematic study of the vulnerability of automatic speaker verification to a diverse range of spoofing attacks. We start with a thorough analysis of the spoofing effects of five speech synthesis and eight voice conversion systems, and the vulnerability of three speaker verification systems under those attacks. We then introduce a number of countermeasures to prevent spoofing attacks from both known and unknown attackers. Known attackers are spoofing systems whose output was used to train the countermeasures, while an unknown attacker is a spoofing system whose output was not available to the countermeasures during training. Finally, we benchmark automatic systems against human performance on both speaker verification and spoofing detection tasks.

...read moreread less

97 citations

Cites methods from "MSR Identity Toolbox v1.0: A MATLAB..."

...We used three WSJ databases (WSJ0, WSJ1, and WSJCAM) and the Resource Management database (RM1) for training the UBM, eigenspaces, and LDA....
[...]

Proceedings Article•DOI•

UTD-CRSS Systems for 2018 NIST Speaker Recognition Evaluation

[...]

Chunlei Zhang¹, Fahimeh Bahmaninezhad¹, Shivesh Ranjan¹, Harishchandra Dubey¹, Wei Xia¹, John H. L. Hansen¹ - Show less +2 more•Institutions (1)

University of Texas at Dallas¹

26 May 2013

TL;DR: This study presents systems submitted by the Center for Robust Speech Systems from UTDallas to NIST SRE 2018, and investigates three alternative front-end speaker embedding frameworks, finding them to be both complementary and effective in achieving overall improved speaker recognition performance.

...read moreread less

Abstract: In this study, we present systems submitted by the Center for Robust Speech Systems (CRSS) from UTDallas to NIST SRE 2018 (SRE18). Three alternative front-end speaker embedding frameworks are investigated, that includes: (i) i-vector, (ii) x-vector, (iii) and a modified triplet speaker embedding system (t-vector). Similar to the previous SRE, language mismatch between training and enrollment/test data, the so-called domain mismatch, remains as a major challenge in this evaluation. In addition, SRE18 also introduces a small portion of audio from an unstructured video corpus in which speaker detection/diarization is supposedly needed to be effectively integrated into speaker recognition for system robustness. In our system development, we focused on: (i) building novel deep neural network based speaker discriminative embedding systems as utterance level feature representations, (ii) exploring alternative dimension reduction methods, back-end classifiers, score normalization techniques which can incorporate unlabeled in-domain data for domain adaptation, (iii) finding an improved data set configurations for the speaker embedding network, LDA/PLDA, and score calibration training (v) and finally, investigating effective score calibration and fusion strategies. The final resulting systems are shown to be both complementary and effective in achieving overall improved speaker recognition performance.

...read moreread less

79 citations

Cites methods from "MSR Identity Toolbox v1.0: A MATLAB..."

...The MSR-Identity toolkit is adopted for the back-end implementation [20]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

Collapse

References

PDF

Open Access

More filters

Book•

Introduction to Statistical Pattern Recognition

[...]

Keinosuke Fukunaga

01 Jan 1972

TL;DR: This completely revised second edition presents an introduction to statistical pattern recognition, which is appropriate as a text for introductory courses in pattern recognition and as a reference book for workers in the field.

...read moreread less

Abstract: This completely revised second edition presents an introduction to statistical pattern recognition Pattern recognition in general covers a wide range of problems: it is applied to engineering problems, such as character readers and wave form analysis as well as to brain modeling in biology and psychology Statistical decision and estimation, which are the main subjects of this book, are regarded as fundamental to the study of pattern recognition This book is appropriate as a text for introductory courses in pattern recognition and as a reference book for workers in the field Each chapter contains computer projects as well as exercises

...read moreread less

10,526 citations

Journal Article•DOI•

Speaker Verification Using Adapted Gaussian Mixture Models

[...]

Douglas A. Reynolds¹, Thomas F. Quatieri¹, Robert B. Dunn¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2000-Digital Signal Processing

TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.

...read moreread less

4,673 citations

Additional excerpts

...0: A MATLAB Toolbox for Speaker...
[...]

Book•

Introduction to statistical pattern recognition (2nd ed.)

[...]

Keinosuke Fukunaga¹•Institutions (1)

Purdue University¹

01 Sep 1990

4,384 citations

"MSR Identity Toolbox v1.0: A MATLAB..." refers methods in this paper

...The dimensionality of the i-vectors are normally reduced through linear discriminant analysis (with Fisher criterion [9]) to annihilate the non-speaker related directions (e....
[...]
...• Sufficient statistics computation for observations given the GMM (compute_bw_stats) [4, 12] • Total variability subspace learning using EM (train_tv_space) [4, 12, 13] • i-vector extraction (extract_ivector) [4, 12, 13] • Linear discriminant analysis (lda) [9] • i-vector length normalization, centering, whitening, and Gaussian probabilistic LDA using EM (gplda-em) [10, 11, 14] • PLDA-based verification trial scoring (score_gplda_trials) [11, 14]...
[...]

Journal Article•DOI•

Front-End Factor Analysis for Speaker Verification

[...]

Najim Dehak¹, Patrick Kenny, Réda Dehak², Pierre Dumouchel, Pierre Ouellet - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, École Pour l'Informatique et les Techniques Avancées²

01 May 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.

...read moreread less

Abstract: This paper presents an extension of our previous work which proposes a new speaker representation for speaker verification. In this modeling, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis. This space is named the total variability space because it models both speaker and channel variabilities. Two speaker verification systems are proposed which use this new representation. The first system is a support vector machine-based system that uses the cosine kernel to estimate the similarity between the input data. The second system directly uses the cosine similarity as the final decision score. We tested three channel compensation techniques in the total variability space, which are within-class covariance normalization (WCCN), linear discriminate analysis (LDA), and nuisance attribute projection (NAP). We found that the best results are obtained when LDA is followed by WCCN. We achieved an equal error rate (EER) of 1.12% and MinDCF of 0.0094 using the cosine distance scoring on the male English trials of the core condition of the NIST 2008 Speaker Recognition Evaluation dataset. We also obtained 4% absolute EER improvement for both-gender trials on the 10 s-10 s condition compared to the classical joint factor analysis scoring.

...read moreread less

3,526 citations

Additional excerpts

...0: A MATLAB Toolbox for Speaker...
[...]

Proceedings Article•DOI•

Probabilistic Linear Discriminant Analysis for Inferences About Identity

[...]

Simon J. D. Prince¹, James H. Elder²•Institutions (2)

University College London¹, York University²

26 Dec 2007

TL;DR: This paper describes face data as resulting from a generative model which incorporates both within- individual and between-individual variation, and calculates the likelihood that the differences between face images are entirely due to within-individual variability.

...read moreread less

Abstract: Many current face recognition algorithms perform badly when the lighting or pose of the probe and gallery images differ. In this paper we present a novel algorithm designed for these conditions. We describe face data as resulting from a generative model which incorporates both within-individual and between-individual variation. In recognition we calculate the likelihood that the differences between face images are entirely due to within-individual variability. We extend this to the non-linear case where an arbitrary face manifold can be described and noise is position-dependent. We also develop a "tied" version of the algorithm that allows explicit comparison across quite different viewing conditions. We demonstrate that our model produces state of the art results for (i) frontal face recognition (ii) face recognition under varying pose.

...read moreread less

1,099 citations