Improved phone-cluster adaptive training acoustic model

doi:10.1109/SPCOM.2016.7746648

Proceedings ArticleDOI

Improved phone-cluster adaptive training acoustic model

- pp 1-5

TLDR

A Two-stage Phone-CAT model is proposed where the phonetic subspace dimension is increased to that of the number of monophone states, and this model will still be able to retain the center phone capturing property of the state-specific vectors in basic Phone- CAT.

Abstract:

Phone-cluster adaptive training (Phone-CAT) is a subspace based acoustic modeling technique inspired from cluster adaptive training (CAT) and subspace Gaussian mixture model (SGMM). This paper explores three extensions, viz., increasing phonetic subspace dimension, including sub-states and speaker subspace, to the basic Phone-CAT model to improve its recognition performance. The latter two extensions are similar in implementation as that of SGMM as both acoustic models share a similar subspace framework. But, since the phonetic subspace dimension of Phone-CAT is constrained to be equal to the number of monophones, the first extension is not straightforward to implement. We propose a Two-stage Phone-CAT model where we increase the phonetic subspace dimension to that of the number of monophone states. This model will still be able to retain the center phone capturing property of the state-specific vectors in basic Phone-CAT. Experiments done on 33-hour train subset of Switchboard database shows improvements in recognition performance of basic Phone-CAT model with the inclusion of the proposed extensions.

References

PDF

Open Access

More filters

Proceedings Article

The Kaldi Speech Recognition Toolkit

Daniel Povey, +12 more

TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.

...read moreread less

Journal ArticleDOI

A Study of Interspeaker Variability in Speaker Verification

Patrick Kenny, +4 more

- 01 Jul 2008 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: It is shown that when a large joint factor analysis model is trained in this way and tested on the core condition, the extended data condition and the cross-channel condition, it is capable of performing at least as well as fusions of multiple systems of other types.

...read moreread less

Journal ArticleDOI

The subspace Gaussian mixture model-A structured model for speech recognition

Daniel Povey, +12 more

- 01 Apr 2011 -

Computer Speech & Language

TL;DR: A new approach to speech recognition, in which all Hidden Markov Model states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state, appears to give better results than a conventional model.

...read moreread less

Journal ArticleDOI

Cluster adaptive training of hidden Markov models

M.J.F. Gales

- 01 Jul 2000 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: This paper examines an adaptation scheme requiring very few parameters, cluster adaptive training (CAT), which may be viewed as a simple extension to speaker clustering, a linear interpolation of all the cluster means is used as the mean of the particular speaker.

...read moreread less

Proceedings ArticleDOI

Subspace Gaussian Mixture Models for speech recognition

Daniel Povey, +12 more

TL;DR: An acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space, and this style of acoustic model allows for a much more compact representation.

...read moreread less