Proceedings ArticleDOI
Improving of Open-Set Language Identification by Using Deep SVM and Thresholding Functions
Ilyes Rebai,Yassine BenAyed,Walid Mahdi +2 more
- pp 796-802
TLDR
This paper proposes a deep SVM based LID back-end system to improve the target languages identification and defines three OOS thresholding formulations, which are used to decide whether the speech segment is a target or OOS language.Abstract:
State-of-the-art language identification (LID) systems are based on an iVector feature extractor front-end followed by a multi-class recognition back-end. Identification accuracy degrades considerably when LID systems face open-set languages. As compared to in-set identification task, the open-set task is adequate to mimic the real challenge of language identification. In this paper, we propose an approach to the problem of out-of-set (OOS) data detection in the context of open-set language identification with zero-knowledge for OOS languages. The main feature of this study is the emphasis on the in-set (target) language identification, on the one hand, and on OOS language detection, on the other hand. Accordingly, we propose a deep SVM based LID back-end system to improve the target languages identification. Along with that, we define three OOS thresholding formulations. These formulations are used to decide whether the speech segment is a target or OOS language. The experimental results demonstrate the effectiveness of the deep SVM back-end system as compared to state-of-the-art techniques. Besides that, the thresholding functions perfectly detect and reject the OOS data. A relative decrease of 6% in Equal Error Rate (EER) is reported over classical OOS detection methods, in discriminating target and OOS languages.read more
Citations
More filters
Journal ArticleDOI
Deep learning for spoken language identification: Can we visualize speech signal patterns?
Himadri Mukherjee,Subhankar Ghosh,Shibaprasad Sen,Obaidullah Sk,K. C. Santosh,Santanu Phadikar,Kaushik Roy +6 more
TL;DR: This paper proposes to use speech signal patterns for spoken language identification, where image-based features are used and the highest accuracy of 99.96%, which outperforms the state-of-the-art reported results.
Journal ArticleDOI
Image-based features for speech signal classification
TL;DR: This paper proposes image-based features for speech signal classification because it is possible to identify different patterns by visualizing their speech patterns and the highest accuracy of 94.51% was obtained.
Journal ArticleDOI
Linear Predictive Coefficients-Based Feature to Identify Top-Seven Spoken Languages
TL;DR: Speech recognition in multilingual scenario is not trivial in the case when multiple languages are used in one conversation and language must be identified before speech recognition as such...
Journal ArticleDOI
Modernizing Open-Set Speech Language Identification
TL;DR: This work tackles the open-set task by adapting two modern-day state-of-the-art approaches to closed-set language identification: the first using a CRNN with attention and the second using a TDNN.
Journal ArticleDOI
Addressing the semi-open set dialect recognition problem under resource-efficient considerations
Spandan Dey,Goutam Saha +1 more
TL;DR: In this article , the authors proposed a semi-open set approach for the spoken dialect recognition task, where a closed set model is exposed to unknown class inputs and utterances from other unknown classes are also included.
References
More filters
Book
The Nature of Statistical Learning Theory
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Journal ArticleDOI
Front-End Factor Analysis for Speaker Verification
TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Proceedings Article
Support vector machines for multi-class pattern recognition.
Jason Weston,Chris Watkins +1 more
TL;DR: A formulation of the SVM is proposed that enables a multi-class pattern recognition problem to be solved in a single optimisation and a similar generalization of linear programming machines is proposed.
Proceedings Article
Kernel Methods for Deep Learning
Youngmin Cho,Lawrence K. Saul +1 more
TL;DR: A new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets are introduced that can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that the authors call multilayers kernel machines (MKMs).
Journal ArticleDOI
A Study of Interspeaker Variability in Speaker Verification
TL;DR: It is shown that when a large joint factor analysis model is trained in this way and tested on the core condition, the extended data condition and the cross-channel condition, it is capable of performing at least as well as fusions of multiple systems of other types.