scispace - formally typeset
Open Access

Neural Network Bottleneck Features for Language Identifica tion

Reads0
Chats0
TLDR
On this type of noisy data, it is shown that in average, the BN features provide a 45% relative improvement in the Cavgor Equal Error Rate (EER) metrics across several test duration conditions, with resp ect to the single best acoustic features.
Abstract
This paper presents the application of Neural Network Bottleneck (BN) features in Language Identification (LID). BN f eatures are generally used for Large Vocabulary Speech Recognition in conjunction with conventional acoustic features, s uch as MFCC or PLP. We compare the BN features to several common types of acoustic features used in the state-of-the-art LID systems. The test set is from DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state-of-the-art detection capabilities on audio from hig hly degraded radio communication channels. On this type of noisy data, we show that in average, the BN features provide a 45% relative improvement in the Cavgor Equal Error Rate (EER) metrics across several test duration conditions, with resp ect to our single best acoustic features. Index Terms: language identification, noisy speech, robust feature extraction

read more

Citations
More filters
Journal ArticleDOI

Deep Neural Network Approaches to Speaker and Language Recognition

TL;DR: This work presents the application of single DNN for both SR and LR using the 2013 Domain Adaptation Challenge speaker recognition (DAC13) and the NIST 2011 language recognition evaluation (LRE11) benchmarks and demonstrates large gains on performance.
Proceedings ArticleDOI

Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System

TL;DR: In this article, a unified and interpretable end-to-end system for both speaker and language recognition is developed, where the encoding layer plays a role in aggregating the variable-length input sequence into an utterance level representation.
Proceedings ArticleDOI

Spoken Language Recognition using X-vectors.

TL;DR: This paper applies x-vectors to the task of spoken language recognition, and experiments with several variations of the x-vector framework, finding that the best performing system uses multilingual bottleneck features, data augmentation, and a discriminative Gaussian classifier.
Posted Content

A Unified Deep Neural Network for Speaker and Language Recognition

TL;DR: In this paper, a unified DNN approach was used for both speaker and language recognition, yielding substantial performance improvements on the 2013 Domain Adaptation Challenge speaker recognition task (55% reduction in EER for the out-of-domain condition) and on the NIST 2011 Language Recognition Evaluation (48% reduction for the 30s test condition).
Proceedings ArticleDOI

Advances in deep neural network approaches to speaker recognition

TL;DR: This work considers two approaches to DNN-based SID: one that uses the DNN to extract features, and another that uses a DNN during feature modeling, and several methods of DNN feature processing are applied to bring significantly greater robustness to microphone speech.
References
More filters
Journal ArticleDOI

Front-End Factor Analysis for Speaker Verification

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Proceedings Article

Approaches to Language Identification using Gaussian Mixture Models and Shifted Delta Cepstral Features

TL;DR: Two GMM-based approaches to language identification that use shifted delta cepstra (SDC) feature vectors to achieve LID performance comparable to that of the best phone-based systems are described.

Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms

Patrick Kenny
TL;DR: A full account of the algorithms needed to carry out a joint factor analysis of speaker and session variability in a training set in which each speaker is recorded over many different channels and the practical limitations that will be encountered if these algorithms are implemented on very large data sets are discussed.
Proceedings ArticleDOI

Short-time Gaussianization for robust speaker verification

TL;DR: It is shown that one of the recent techniques used for speaker recognition, feature warping can be formulated within the framework of Gaussianization, and around 20% relative improvement in both equal error rate (EER) and minimum detection cost function (DCF) is obtained on NIST 2001 cellular phone data evaluation.
Proceedings ArticleDOI

Simplification and optimization of i-vector extraction

TL;DR: Under certain assumptions, the formulas for i-vector extraction—also used in i- vector extractor training—can be simplified and lead to a faster and memory more efficient code.
Related Papers (5)