Neural Network Bottleneck Features for Language Identifica tion

Open Access

Neural Network Bottleneck Features for Language Identifica tion

Chats0

TLDR

On this type of noisy data, it is shown that in average, the BN features provide a 45% relative improvement in the Cavgor Equal Error Rate (EER) metrics across several test duration conditions, with resp ect to the single best acoustic features.

Abstract:

This paper presents the application of Neural Network Bottleneck (BN) features in Language Identification (LID). BN f eatures are generally used for Large Vocabulary Speech Recognition in conjunction with conventional acoustic features, s uch as MFCC or PLP. We compare the BN features to several common types of acoustic features used in the state-of-the-art LID systems. The test set is from DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state-of-the-art detection capabilities on audio from hig hly degraded radio communication channels. On this type of noisy data, we show that in average, the BN features provide a 45% relative improvement in the Cavgor Equal Error Rate (EER) metrics across several test duration conditions, with resp ect to our single best acoustic features. Index Terms: language identification, noisy speech, robust feature extraction

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep Neural Network Approaches to Speaker and Language Recognition

Fred Richardson, +2 more

- 06 Apr 2015 -

IEEE Signal Processing Letters

TL;DR: This work presents the application of single DNN for both SR and LR using the 2013 Domain Adaptation Challenge speaker recognition (DAC13) and the NIST 2011 language recognition evaluation (LRE11) benchmarks and demonstrates large gains on performance.

...read moreread less

Proceedings ArticleDOI

Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System

Weicheng Cai, +2 more

TL;DR: In this article, a unified and interpretable end-to-end system for both speaker and language recognition is developed, where the encoding layer plays a role in aggregating the variable-length input sequence into an utterance level representation.

...read moreread less

Proceedings ArticleDOI

Spoken Language Recognition using X-vectors.

David Snyder, +5 more

TL;DR: This paper applies x-vectors to the task of spoken language recognition, and experiments with several variations of the x-vector framework, finding that the best performing system uses multilingual bottleneck features, data augmentation, and a discriminative Gaussian classiﬁer.

...read moreread less

Posted Content

A Unified Deep Neural Network for Speaker and Language Recognition

Fred Richardson, +2 more

- 03 Apr 2015 -

arXiv: Computation and Language

TL;DR: In this paper, a unified DNN approach was used for both speaker and language recognition, yielding substantial performance improvements on the 2013 Domain Adaptation Challenge speaker recognition task (55% reduction in EER for the out-of-domain condition) and on the NIST 2011 Language Recognition Evaluation (48% reduction for the 30s test condition).

...read moreread less

Proceedings ArticleDOI

Advances in deep neural network approaches to speaker recognition

Mitchell McLaren, +2 more

TL;DR: This work considers two approaches to DNN-based SID: one that uses the DNN to extract features, and another that uses a DNN during feature modeling, and several methods of DNN feature processing are applied to bring significantly greater robustness to microphone speech.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Front-End Factor Analysis for Speaker Verification

Najim Dehak, +4 more

- 01 May 2011 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.

...read moreread less

Proceedings Article

Approaches to Language Identification using Gaussian Mixture Models and Shifted Delta Cepstral Features

Pedro A. Torres-Carrasquillo, +6 more

TL;DR: Two GMM-based approaches to language identification that use shifted delta cepstra (SDC) feature vectors to achieve LID performance comparable to that of the best phone-based systems are described.

...read moreread less

Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms

Patrick Kenny

TL;DR: A full account of the algorithms needed to carry out a joint factor analysis of speaker and session variability in a training set in which each speaker is recorded over many different channels and the practical limitations that will be encountered if these algorithms are implemented on very large data sets are discussed.

...read moreread less

Proceedings ArticleDOI

Short-time Gaussianization for robust speaker verification

Bing Xiang, +4 more

TL;DR: It is shown that one of the recent techniques used for speaker recognition, feature warping can be formulated within the framework of Gaussianization, and around 20% relative improvement in both equal error rate (EER) and minimum detection cost function (DCF) is obtained on NIST 2001 cellular phone data evaluation.

...read moreread less

Proceedings ArticleDOI

Simplification and optimization of i-vector extraction

Ondrej Glembek, +4 more

TL;DR: Under certain assumptions, the formulas for i-vector extraction—also used in i- vector extractor training—can be simplified and lead to a faster and memory more efficient code.

...read moreread less

Related Papers (5)

Front-End Factor Analysis for Speaker Verification

Najim Dehak, +4 more

- 01 May 2011 -

IEEE Transactions on Audio, Speech, and ...

IEEE Signal Processing Magazine

Neural Network Bottleneck Features for Language Identifica tion

Citations

Deep Neural Network Approaches to Speaker and Language Recognition

Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System

Spoken Language Recognition using X-vectors.

A Unified Deep Neural Network for Speaker and Language Recognition

Advances in deep neural network approaches to speaker recognition

References

Front-End Factor Analysis for Speaker Verification

Approaches to Language Identification using Gaussian Mixture Models and Shifted Delta Cepstral Features

Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms

Short-time Gaussianization for robust speaker verification

Simplification and optimization of i-vector extraction

Related Papers (5)

Front-End Factor Analysis for Speaker Verification

A novel scheme for speaker recognition using a phonetically-aware deep neural network

The Kaldi Speech Recognition Toolkit

Language Recognition via i-vectors and Dimensionality Reduction.

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups