Open Access
MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research
Reads0
Chats0
TLDR
The MSR Identity Toolbox is released, which contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition, and provides many of the functionalities available in other open-source speaker recognition toolkits.Abstract:
We are happy to announce the release of the MSR Identity Toolbox: A MATLAB toolbox for speaker-recognition research. This toolbox contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition. It provides researchers with a test bed for developing new front-end and back-end techniques, allowing replicable evaluation of new advancements. It will also help newcomers in the field by lowering the "barrier to entry," enabling them to quickly build baseline systems for their experiments. Although the focus of this toolbox is on speaker recognition, it can also be used for other speech related applications such as language, dialect, and accent identification. Additionally, it provides many of the functionalities available in other open-source speaker recognition toolkits (e.g., ALIZEread more
Citations
More filters
Posted Content
Joint Sound Source Separation and Speaker Recognition
Jeroen Zegers,Hugo Van hamme +1 more
TL;DR: It is shown how state-of-the-art multichannel NMF for blind source separation can be easily extended to incorporate speaker recognition and outperforms the sequential approach of first applying source separation, followed by speaker recognition that uses state of theart i-vector techniques.
Journal ArticleDOI
Evaluation of Batvox 3.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01)
Cuiling Zhang,Chang Tang +1 more
TL;DR: The results show that the performance of the Batvox 3.1 system continued to improve as the amount of training data increased.
Proceedings ArticleDOI
Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizers
TL;DR: A new language recognition technique based on the application of the philosophy of the Shifted Delta Coefficients to phone log-likelihood ratio features (PLLR) is described which allows the incorporation of long-span phonetic information at a frame-by-frame level while dealing with the temporal length of each phone unit.
Proceedings ArticleDOI
Robust speaker recognition by means of acoustic transmission channel matching: An acoustic parameter estimation approach
TL;DR: This paper proposes to estimate noise and reverberation and incorporate them into individual training examples to create virtually matched channels and details the proposed method, results and discusses the potentials and limitations.
Journal ArticleDOI
The neural oscillatory markers of phonetic convergence during verbal interaction
Sankar Mukherjee,Leonardo Badino,Pauline M. Hilt,Alice Tomassini,Alberto Inuggi,Luciano Fadiga,Luciano Fadiga,Noël Nguyen,Alessandro D'Ausilio,Alessandro D'Ausilio +9 more
TL;DR: Evidence is provided that mutual adaptation of speech phonetic targets, correlates with specific alpha and beta oscillatory dynamics, reinforcing the suggestion that perception and production processes are highly interdependent and co‐constructed during a conversation.
References
More filters
Book
Introduction to Statistical Pattern Recognition
TL;DR: This completely revised second edition presents an introduction to statistical pattern recognition, which is appropriate as a text for introductory courses in pattern recognition and as a reference book for workers in the field.
Journal ArticleDOI
Speaker Verification Using Adapted Gaussian Mixture Models
TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.
Journal ArticleDOI
Front-End Factor Analysis for Speaker Verification
TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Proceedings ArticleDOI
Probabilistic Linear Discriminant Analysis for Inferences About Identity
TL;DR: This paper describes face data as resulting from a generative model which incorporates both within- individual and between-individual variation, and calculates the likelihood that the differences between face images are entirely due to within-individual variability.