scispace - formally typeset
Open Access

MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research

Reads0
Chats0
TLDR
The MSR Identity Toolbox is released, which contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition, and provides many of the functionalities available in other open-source speaker recognition toolkits.
Abstract
We are happy to announce the release of the MSR Identity Toolbox: A MATLAB toolbox for speaker-recognition research. This toolbox contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition. It provides researchers with a test bed for developing new front-end and back-end techniques, allowing replicable evaluation of new advancements. It will also help newcomers in the field by lowering the "barrier to entry," enabling them to quickly build baseline systems for their experiments. Although the focus of this toolbox is on speaker recognition, it can also be used for other speech related applications such as language, dialect, and accent identification. Additionally, it provides many of the functionalities available in other open-source speaker recognition toolkits (e.g., ALIZE

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Identification of age-group from children's speech by computers and humans.

TL;DR: This paper presents results on age-group identification (AgeID) for children’s speech, using the OGI Kids corpus and GMM-UBM,GMM-SVM and i-vector systems, and the effect of using genderindependent and gender-dependent age-groups modelling is explored.
Proceedings ArticleDOI

On the use of I-vectors and average voice model for voice conversion without parallel data

TL;DR: This work proposes to use average voice model and i-vectors for long short-term memory (LSTM) based voice conversion, which does not require parallel data from source and target speakers.
Journal ArticleDOI

Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion.

TL;DR: A vocal tract length normalization (VTLN) technique to transform the acoustic features of different speakers to a target speaker acoustic space such that speaker specific details are minimized and proves that the VTLN approach provides improvement in performance even across datasets.
Proceedings ArticleDOI

Vocal Tract Length Normalization for Speaker Independent Acoustic-to-Articulatory Speech Inversion.

TL;DR: A vocal tract length normalization (VTLN) technique to transform the acoustic space of different speakers to a target speaker space such that speaker specific details are minimized and results show that the proposed speaker normalization approach provides a 7% absolute improvement in correlation.
Proceedings ArticleDOI

SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems

TL;DR: In this article, the authors argue that the end-to-end architecture of speech and speaker recognition systems and the nature of their inputs make attacks and defenses against them substantially different than those in the image space.
References
More filters
Book

Introduction to Statistical Pattern Recognition

TL;DR: This completely revised second edition presents an introduction to statistical pattern recognition, which is appropriate as a text for introductory courses in pattern recognition and as a reference book for workers in the field.
Journal ArticleDOI

Speaker Verification Using Adapted Gaussian Mixture Models

TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.
Journal ArticleDOI

Front-End Factor Analysis for Speaker Verification

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Proceedings ArticleDOI

Probabilistic Linear Discriminant Analysis for Inferences About Identity

TL;DR: This paper describes face data as resulting from a generative model which incorporates both within- individual and between-individual variation, and calculates the likelihood that the differences between face images are entirely due to within-individual variability.
Related Papers (5)