scispace - formally typeset
Search or ask a question
Topic

Subspace Gaussian Mixture Model

About: Subspace Gaussian Mixture Model is a research topic. Over the lifetime, 42 publications have been published within this topic receiving 5740 citations.

Papers
More filters
Proceedings Article
01 Jan 2011
TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Abstract: We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state automata (using the freely available OpenFst), together with detailed documentation and a comprehensive set of scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users.

5,857 citations

Journal ArticleDOI
TL;DR: A new approach to speech recognition, in which all Hidden Markov Model states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state, appears to give better results than a conventional model.

304 citations

Journal ArticleDOI
TL;DR: A cross-corpus acoustic normalization procedure is used which is a variant of speaker adaptive training (SAT) (Mohan et al., 2012a) and provides the best speech recognition performance for both languages.

41 citations

Proceedings ArticleDOI
01 Dec 2011
TL;DR: It is shown that multilingually trained SGMM shared parameters result in lower word error rates (WERs) than using those from a single source language, and regularizing the estimation of the SGMM state vectors by penalizing their ℓ1-norm help to overcome numerical instabilities and lead to lower WER.
Abstract: We investigate cross-lingual acoustic modelling for low resource languages using the subspace Gaussian mixture model (SGMM). We assume the presence of acoustic models trained on multiple source languages, and use the global subspace parameters from those models for improved modelling in a target language with limited amounts of transcribed speech. Experiments on the GlobalPhone corpus using Spanish, Portuguese, and Swedish as source languages and German as target language (with 1 hour and 5 hours of transcribed audio) show that multilingually trained SGMM shared parameters result in lower word error rates (WERs) than using those from a single source language. We also show that regularizing the estimation of the SGMM state vectors by penalizing their l 1 -norm help to overcome numerical instabilities and lead to lower WER.

35 citations

Journal ArticleDOI
TL;DR: Pitch-adaptive front-end signal processing in deriving the Mel-frequency cepstral coefficient features is explored to reduce the sensitivity to pitch variation and the effectiveness of existing speaker normalization techniques remain intact even with the use of proposed pitch- Adaptive MFCCs.

35 citations

Network Information
Related Topics (5)
Hidden Markov model
28.3K papers, 725.3K citations
81% related
Feature learning
15.5K papers, 684.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Feature extraction
111.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20213
20203
20193
20182
20172
20162