Approaches to Language Identification using Gaussian Mixture Models and Shifted Delta Cepstral Features

Open AccessProceedings Article

Approaches to Language Identification using Gaussian Mixture Models and Shifted Delta Cepstral Features

TLDR

Two GMM-based approaches to language identification that use shifted delta cepstra (SDC) feature vectors to achieve LID performance comparable to that of the best phone-based systems are described.

Abstract:

Published results indicate that automatic language identification (LID) systems that rely on multiple-language phone recognition and n-gram language modeling produce the best performance in formal LID evaluations. By contrast, Gaussian mixture model (GMM) systems, which measure acoustic characteristics, are far more efficient computationally but have tended to provide inferior levels of performance. This paper describes two GMM-based approaches to language identification that use shifted delta cepstra (SDC) feature vectors to achieve LID performance comparable to that of the best phone-based systems. The approaches include both acoustic scoring and a recently developed GMM tokenization system that is based on a variation of phonetic recognition and language modeling. System performance is evaluated on both the CallFriend and OGI corpora.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Support vector machines for speaker and language recognition

William M. Campbell, +4 more

- 01 Apr 2006 -

Computer Speech & Language

TL;DR: This work considers the application of SVMs to speaker and language recognition and uses a sequence kernel that compares sequences of feature vectors and produces a measure of similarity to build upon a simpler mean-squared error classifier to produce a more accurate system.

...read moreread less

Proceedings ArticleDOI

A Comparison of Features for Synthetic Speech Detection

Md. Sahidullah, +2 more

TL;DR: Comparative results indicate that features representing spectral information in high-frequency region, dynamic information of speech, and detailed information related to subband characteristics are considerably more useful in detecting synthetic speech detection task.

...read moreread less

Journal ArticleDOI

Spoken Language Recognition: From Fundamentals to Practice

Haizhou Li, +2 more

TL;DR: This paper attempts to provide an introductory tutorial on the fundamentals of the theory and the state-of-the-art solutions of spoken language recognition, from both phonological and computational aspects.

...read moreread less

Journal ArticleDOI

A Vector Space Modeling Approach to Spoken Language Identification

Haizhou Li, +2 more

- 01 Jan 2007 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: The proposed VSM approach leads to a discriminative classifier backend, which is demonstrated to give superior performance over likelihood-based n-gram language modeling (LM) backend for long utterances.

...read moreread less

Proceedings Article

Language Recognition in iVectors Space

David Martínez González, +4 more

TL;DR: To recognize language in the iVector space, three different linear classifiers are experiment with: one based on a generative model, where classes are modeled by Gaussian distributions with shared covariance matrix, and two discriminative classifiers, namely linear Support Vector Machine and Logistic Regression.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Linguistic Data Consortium

Treebank Penn

Journal ArticleDOI

Comparison of four approaches to automatic language identification of telephone speech

M.A. Zissman

- 01 Jan 1996 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: Four approaches for automatic language identification of speech utterances are compared: Gaussian mixture model (GMM) classification; single-language phone recognition followed by languaged dependent, interpolated n-gram language modeling (PRLM); parallel PRLM, which uses multiple single- language phone recognizers, each trained in a different language; and languagedependent parallel phone recognition (PPR).

...read moreread less

Proceedings Article

The OGI multi-language telephone speech corpus.

Yeshwant K. Muthusamy, +2 more

TL;DR: The recording protocol, data collection procedure, ongoing corpus development, preliminary results of the statistical evaluation of the 10 languages, and plans to provide orthographic transcriptions of the speech are described.

...read moreread less

Proceedings ArticleDOI

Language identification using Gaussian mixture model tokenization

Pedro A. Torres-Carrasquillo, +2 more

TL;DR: Phone tokenization followed by n-gram language modeling has consistently provided good results for the task of language identification, but this technique is generalized by using Gaussian mixture models as the basis for tokenizing.

...read moreread less

Proceedings ArticleDOI

Language identification using shifted delta cepstra

M.A. Kohler, +1 more

TL;DR: A method for finding the optimal parameters for identifying a set of languages, specifies these parameters for a language identification task, and provides a performance comparison.

...read moreread less