scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Improving of Open-Set Language Identification by Using Deep SVM and Thresholding Functions

01 Oct 2017-pp 796-802
TL;DR: This paper proposes a deep SVM based LID back-end system to improve the target languages identification and defines three OOS thresholding formulations, which are used to decide whether the speech segment is a target or OOS language.
Abstract: State-of-the-art language identification (LID) systems are based on an iVector feature extractor front-end followed by a multi-class recognition back-end. Identification accuracy degrades considerably when LID systems face open-set languages. As compared to in-set identification task, the open-set task is adequate to mimic the real challenge of language identification. In this paper, we propose an approach to the problem of out-of-set (OOS) data detection in the context of open-set language identification with zero-knowledge for OOS languages. The main feature of this study is the emphasis on the in-set (target) language identification, on the one hand, and on OOS language detection, on the other hand. Accordingly, we propose a deep SVM based LID back-end system to improve the target languages identification. Along with that, we define three OOS thresholding formulations. These formulations are used to decide whether the speech segment is a target or OOS language. The experimental results demonstrate the effectiveness of the deep SVM back-end system as compared to state-of-the-art techniques. Besides that, the thresholding functions perfectly detect and reject the OOS data. A relative decrease of 6% in Equal Error Rate (EER) is reported over classical OOS detection methods, in discriminating target and OOS languages.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper proposes to use speech signal patterns for spoken language identification, where image-based features are used and the highest accuracy of 99.96%, which outperforms the state-of-the-art reported results.
Abstract: Western countries entertain speech recognition-based applications. It does not happen in a similar magnitude in East Asia. Language complexity could potentially be one of the primary reasons behind this lag. Besides, multilingual countries like India need to be considered so that language identification (words and phrases) can be possible through speech signals. Unlike the previous works, in this paper, we propose to use speech signal patterns for spoken language identification, where image-based features are used. The concept is primarily inspired from the fact that speech signal can be read/visualized. In our experiment, we use spectrograms (for image data) and deep learning for spoken language classification. Using the IIIT-H Indic speech database for Indic languages, we achieve the highest accuracy of 99.96%, which outperforms the state-of-the-art reported results. Furthermore, for a relative decrease of 4018.60% in the signal-to-noise ratio, a decrease of only 0.50% in accuracy tells us the fact that our concept is fairly robust.

20 citations

Journal ArticleDOI
TL;DR: This paper proposes image-based features for speech signal classification because it is possible to identify different patterns by visualizing their speech patterns and the highest accuracy of 94.51% was obtained.
Abstract: Like other applications, under the purview of pattern classification, analyzing speech signals is crucial. People often mix different languages while talking which makes this task complicated. This happens mostly in India, since different languages are used from one state to another. Among many, Southern part of India suffers a lot from this situation, where distinguishing their languages is important. In this paper, we propose image-based features for speech signal classification because it is possible to identify different patterns by visualizing their speech patterns. Modified Mel frequency cepstral coefficient (MFCC) features namely MFCC- Statistics Grade (MFCC-SG) were extracted which were visualized by plotting techniques and thereafter fed to a convolutional neural network. In this study, we used the top 4 languages namely Telugu, Tamil, Malayalam, and Kannada. Experiments were performed on more than 900 hours of data collected from YouTube leading to over 150000 images and the highest accuracy of 94.51% was obtained.

8 citations


Cites methods from "Improving of Open-Set Language Iden..."

  • ...[29] used deep SVM for detecting out of set languages in the task of language identification and presented 3 formulations for the out of set languages as well....

    [...]

Journal ArticleDOI
TL;DR: Speech recognition in multilingual scenario is not trivial in the case when multiple languages are used in one conversation and language must be identified before speech recognition as such...
Abstract: Speech recognition in multilingual scenario is not trivial in the case when multiple languages are used in one conversation. Language must be identified before we process speech recognition as such...

7 citations

Journal ArticleDOI
TL;DR: This work tackles the open-set task by adapting two modern-day state-of-the-art approaches to closed-set language identification: the first using a CRNN with attention and the second using a TDNN.
Abstract: While most modern speech Language Identification methods are closed-set, we want to see if they can be modified and adapted for the open-set problem. When switching to the open-set problem, the solution gains the ability to reject an audio input when it fails to match any of our known language options. We tackle the open-set task by adapting two modern-day state-of-the-art approaches to closed-set language identification: the first using a CRNN with attention and the second using a TDNN. In addition to enhancing our input feature embeddings using MFCCs, log spectral features, and pitch, we will be attempting two approaches to out-of-set language detection: one using thresholds, and the other essentially performing a verification task. We will compare both the performance of the TDNN and the CRNN, as well as our detection approaches.
Journal ArticleDOI
TL;DR: In this article , the authors proposed a semi-open set approach for the spoken dialect recognition task, where a closed set model is exposed to unknown class inputs and utterances from other unknown classes are also included.
References
More filters
Proceedings ArticleDOI
04 Dec 2013
TL;DR: This paper combines kernels at each layer and then optimize over an estimate of the support vector machine leave-one-out error rather than the dual objective function to improve performance on a variety of datasets.
Abstract: Deep learning methods have predominantly been applied to large artificial neural networks. Despite their state-of-the-art performance, these large networks typically do not generalize well to datasets with limited sample sizes. In this paper, we take a different approach by learning multiple layers of kernels. We combine kernels at each layer and then optimize over an estimate of the support vector machine leave-one-out error rather than the dual objective function. Our experiments on a variety of datasets show that each layer successively increases performance with only a few base kernels.

57 citations

Proceedings Article
01 Jan 2008
TL;DR: This paper presents a description of the MIT Lincoln Laboratory language recognition system, a fusion of three core recognizers, two based on spectral similarity and one based on tokenization, submitted to the NIST 2009 Language Recognition Evaluation (LRE).
Abstract: This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2007 Language Recognition Evaluation. This system consists of a fusion of four core recognizers, two based on tokenization and two based on spectral similarity. Results for NIST’s 14-language detection task are presented for both the closed-set and open-set tasks and for the 30, 10 and 3 second durations. On the 30 second 14-language closed set detection task, the system achieves a 1% equal error rate.

45 citations


"Improving of Open-Set Language Iden..." refers methods in this paper

  • ...[22], [23] NIST 2009 OOS modeling GMM, SVM and tokenizer Tokheim [24] NIST 2003 Thresholding function UBM-GMM Zhang and Hansen [18] NIST LRE 2009 OOS modeling GMM behravan et al....

    [...]

Journal ArticleDOI
TL;DR: This paper proposes to optimize the network over an adaptive backpropagation MLMKL framework using the gradient ascent method instead of dual objective function, or the estimation of the leave-one-out error, and achieves high performance.
Abstract: Multiple kernel learning (MKL) approach has been proposed for kernel methods and has shown high performance for solving some real-world applications. It consists on learning the optimal kernel from one layer of multiple predefined kernels. Unfortunately, this approach is not rich enough to solve relatively complex problems. With the emergence and the success of the deep learning concept, multilayer of multiple kernel learning (MLMKL) methods were inspired by the idea of deep architecture. They are introduced in order to improve the conventional MKL methods. Such architectures tend to learn deep kernel machines by exploring the combinations of multiple kernels in a multilayer structure. However, existing MLMKL methods often have trouble with the optimization of the network for two or more layers. Additionally, they do not always outperform the simplest method of combining multiple kernels (i.e., MKL). In order to improve the effectiveness of MKL approaches, we introduce, in this paper, a novel backpropagation MLMKL framework. Specifically, we propose to optimize the network over an adaptive backpropagation algorithm. We use the gradient ascent method instead of dual objective function, or the estimation of the leave-one-out error. We test our proposed method through a large set of experiments on a variety of benchmark data sets. We have successfully optimized the system over many layers. Empirical results over an extensive set of experiments show that our algorithm achieves high performance compared to the traditional MKL approach and existing MLMKL methods.

36 citations


"Improving of Open-Set Language Iden..." refers background or methods in this paper

  • ...It aims at replacing the single kernel SVM approach by a network of kernels as in deep neural network [13]–[17]....

    [...]

  • ...Additional details on the MLMKL procedure is described in [17]....

    [...]

  • ...In MLMKL framework, all base kernels in antecedent layers are combined so as to form new inputs to the base kernels in subsequent layers [17]....

    [...]

Proceedings ArticleDOI
06 Sep 2015
TL;DR: Evaluations on specific dialect recognition tasks show that the DBN based i-vector can achieve significant and consistent performance gains over conventional GMM-UBM and DNN based I-vector methods.
Abstract: This paper presents a unified i-vector framework for language identification (LID) based on deep bottleneck networks (DBN) trained for automatic speech recognition (ASR). The framework covers both front-end feature extraction and back-end modeling stages.The output from different layers of a DBN are exploited to improve the effectiveness of the i-vector representation through incorporating a mixture of acoustic and phonetic information. Furthermore, a universal model is derived from the DBN with a LID corpus. This is a somewhat inverse process to the GMM-UBM method, in which the GMM of each language is mapped from a GMM-UBM. Evaluations on specific dialect recognition tasks show that the DBN based i-vector can achieve significant and consistent performance gains over conventional GMM-UBM and DNN based i-vector methods. The generalization capability of this framework is also evaluated using DBNs trained on Mandarin and English corpuses. Index Terms: Language Identification, Deep Neural Network, Deep Bottleneck Feature, i-vector representation

35 citations


"Improving of Open-Set Language Iden..." refers background in this paper

  • ...Since then, different alternatives have been introduced including i-vectors based on bottleneck features [7], [8]....

    [...]

Proceedings ArticleDOI
22 Sep 2008
TL;DR: This paper describes the acoustic language recognition subsystems of Brno University of Technology (BUT) which contributed to the BUT main submission to the NIST LRE 2007 and the complementarity of the approaches is analyzed.
Abstract: This paper describes the acoustic language recognition subsystems of Brno University of Technology (BUT) which contributed to the BUT main submission to the NIST LRE 2007. Two main techniques are employed in the subsystems discriminative training in terms of Maximum Mutual Information, and channel compensation in terms of eigenchannel adaptation in both, model and feature domain. The complementarity of the approaches is analyzed.

28 citations


"Improving of Open-Set Language Iden..." refers background in this paper

  • ...It consistently outperforms its high-level counterparts, including Gaussian mixture models (GMM) [2], [4], [11] and Gaussian Mixture Model-Universal Background Model (GMM-UBM) [2], [12]....

    [...]

Trending Questions (1)
Is SVM a part of deep learning?

The experimental results demonstrate the effectiveness of the deep SVM back-end system as compared to state-of-the-art techniques.