Improving of Open-Set Language Identification by Using Deep SVM and Thresholding Functions

doi:10.1109/AICCSA.2017.119

Proceedings ArticleDOI

Improving of Open-Set Language Identification by Using Deep SVM and Thresholding Functions

- pp 796-802

TLDR

This paper proposes a deep SVM based LID back-end system to improve the target languages identification and defines three OOS thresholding formulations, which are used to decide whether the speech segment is a target or OOS language.

Abstract:

State-of-the-art language identification (LID) systems are based on an iVector feature extractor front-end followed by a multi-class recognition back-end. Identification accuracy degrades considerably when LID systems face open-set languages. As compared to in-set identification task, the open-set task is adequate to mimic the real challenge of language identification. In this paper, we propose an approach to the problem of out-of-set (OOS) data detection in the context of open-set language identification with zero-knowledge for OOS languages. The main feature of this study is the emphasis on the in-set (target) language identification, on the one hand, and on OOS language detection, on the other hand. Accordingly, we propose a deep SVM based LID back-end system to improve the target languages identification. Along with that, we define three OOS thresholding formulations. These formulations are used to decide whether the speech segment is a target or OOS language. The experimental results demonstrate the effectiveness of the deep SVM back-end system as compared to state-of-the-art techniques. Besides that, the thresholding functions perfectly detect and reject the OOS data. A relative decrease of 6% in Equal Error Rate (EER) is reported over classical OOS detection methods, in discriminating target and OOS languages.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep learning for spoken language identification: Can we visualize speech signal patterns?

Himadri Mukherjee, +6 more

- 01 Dec 2019 -

Neural Computing and Applications

TL;DR: This paper proposes to use speech signal patterns for spoken language identification, where image-based features are used and the highest accuracy of 99.96%, which outperforms the state-of-the-art reported results.

...read moreread less

Journal ArticleDOI

Image-based features for speech signal classification

Himadri Mukherjee, +4 more

- 01 Dec 2020 -

Multimedia Tools and Applications

TL;DR: This paper proposes image-based features for speech signal classification because it is possible to identify different patterns by visualizing their speech patterns and the highest accuracy of 94.51% was obtained.

...read moreread less

Journal ArticleDOI

Linear Predictive Coefficients-Based Feature to Identify Top-Seven Spoken Languages

Himadri Mukherjee, +5 more

- 15 Jun 2020 -

International Journal of Pattern Recogni...

TL;DR: Speech recognition in multilingual scenario is not trivial in the case when multiple languages are used in one conversation and language must be identified before speech recognition as such...

...read moreread less

Journal ArticleDOI

Modernizing Open-Set Speech Language Identification

Mustafa Eyceoz, +2 more

- 20 May 2022 -

arXiv.org

TL;DR: This work tackles the open-set task by adapting two modern-day state-of-the-art approaches to closed-set language identiﬁcation: the first using a CRNN with attention and the second using a TDNN.

...read moreread less

Journal ArticleDOI

Addressing the semi-open set dialect recognition problem under resource-efficient considerations

Spandan Dey, +1 more

- 01 Jul 2023 -

Speech Communication

TL;DR: In this article , the authors proposed a semi-open set approach for the spoken dialect recognition task, where a closed set model is exposed to unknown class inputs and utterances from other unknown classes are also included.

...read moreread less

References

PDF

Open Access

More filters

Book

The Nature of Statistical Learning Theory

Vladimir Vapnik

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Journal ArticleDOI

Front-End Factor Analysis for Speaker Verification

Najim Dehak, +4 more

- 01 May 2011 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.

...read moreread less

Proceedings Article

Support vector machines for multi-class pattern recognition.

Jason Weston, +1 more

TL;DR: A formulation of the SVM is proposed that enables a multi-class pattern recognition problem to be solved in a single optimisation and a similar generalization of linear programming machines is proposed.

...read moreread less

Proceedings Article

Kernel Methods for Deep Learning

Youngmin Cho, +1 more

TL;DR: A new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets are introduced that can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that the authors call multilayers kernel machines (MKMs).

...read moreread less

Journal ArticleDOI

A Study of Interspeaker Variability in Speaker Verification

Patrick Kenny, +4 more

- 01 Jul 2008 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: It is shown that when a large joint factor analysis model is trained in this way and tested on the core condition, the extended data condition and the cross-channel condition, it is capable of performing at least as well as fusions of multiple systems of other types.

...read moreread less

Collapse

Improving of Open-Set Language Identification by Using Deep SVM and Thresholding Functions

Citations

Deep learning for spoken language identification: Can we visualize speech signal patterns?

Image-based features for speech signal classification

Linear Predictive Coefficients-Based Feature to Identify Top-Seven Spoken Languages

Modernizing Open-Set Speech Language Identification

Addressing the semi-open set dialect recognition problem under resource-efficient considerations

References

The Nature of Statistical Learning Theory

Front-End Factor Analysis for Speaker Verification

Support vector machines for multi-class pattern recognition.

Kernel Methods for Deep Learning

A Study of Interspeaker Variability in Speaker Verification

Related Papers (5)

NAP for high level language identification

Improving Transformer Based End-to-End Code-Switching Speech Recognition Using Language Identification

Statistical modeling of heterogeneous features for speech processing tasks

Posterior-thresholding feature extraction for paralinguistic speech classification

Towards a generic approach for automatic speech recognition error detection and classification

Trending Questions (1)