Open Access
MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research
Reads0
Chats0
TLDR
The MSR Identity Toolbox is released, which contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition, and provides many of the functionalities available in other open-source speaker recognition toolkits.Abstract:
We are happy to announce the release of the MSR Identity Toolbox: A MATLAB toolbox for speaker-recognition research. This toolbox contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition. It provides researchers with a test bed for developing new front-end and back-end techniques, allowing replicable evaluation of new advancements. It will also help newcomers in the field by lowering the "barrier to entry," enabling them to quickly build baseline systems for their experiments. Although the focus of this toolbox is on speaker recognition, it can also be used for other speech related applications such as language, dialect, and accent identification. Additionally, it provides many of the functionalities available in other open-source speaker recognition toolkits (e.g., ALIZEread more
Citations
More filters
Journal ArticleDOI
Automatic Speaker Recognition System in Adverse Conditions — Implication of Noise and Reverberation on System Performance
TL;DR: In this paper, the behavior of typical speaker recognition systems in adverse retrieval phases was investigated. And the results showed that noise and reverbation can degrade the performance of recognition, and that both reverberation time and direct to reverberation ratio affect recognition accuracy.
Journal ArticleDOI
Who shall I say is calling? Validation of a caller recognition procedure in Bornean flanged male orangutan (Pongo pygmaeus wurmbii) long calls
TL;DR: In this paper, the accuracy of acoustic caller recognition in long calls (LCs) emitted by Bornean male orangutans (Pongo pygmaeus wurmbii) derived from two data-sets: the first consists of high-quality recordings taken during individual focal follows (N = 224 LCs by 14 males) and the second consists of LC recordings with variable microphone-caller distances stemming from acoustic localization system (N ǫ = 123LCs by 10 males).
Proceedings ArticleDOI
Emotion Analysis Using Audio/Video, EMG and EEG: A Dataset and Comparison Study
TL;DR: It is found that the most effective stages for emotion recognition from EEG occur after the emotion has been expressed, suggesting that the neural signals conveying an emotion are long-lasting.
Journal ArticleDOI
Automatic accent identification as an analytical tool for accent robust automatic speech recognition
TL;DR: i-vector based AID analysis provides a principled approach to the selection of training material for accent robust ASR, and it is speculated that this may generalize to other detection technologies and other types of variability, such as Speaker Identification and speaker variability.
Proceedings ArticleDOI
Siamese Convolutional Neural Network Using Gaussian Probability Feature for Spoofing Speech Detection.
TL;DR: The 1-D Convolutional Neural Network whose input is the log-probabilities of the speech frames on the GMM components is proposed, and the pooling is used to extract the speech global character.
References
More filters
Book
Introduction to Statistical Pattern Recognition
TL;DR: This completely revised second edition presents an introduction to statistical pattern recognition, which is appropriate as a text for introductory courses in pattern recognition and as a reference book for workers in the field.
Journal ArticleDOI
Speaker Verification Using Adapted Gaussian Mixture Models
TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.
Journal ArticleDOI
Front-End Factor Analysis for Speaker Verification
TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Proceedings ArticleDOI
Probabilistic Linear Discriminant Analysis for Inferences About Identity
TL;DR: This paper describes face data as resulting from a generative model which incorporates both within- individual and between-individual variation, and calculates the likelihood that the differences between face images are entirely due to within-individual variability.