scispace - formally typeset
Proceedings ArticleDOI

Improved phone-cluster adaptive training acoustic model

TLDR
A Two-stage Phone-CAT model is proposed where the phonetic subspace dimension is increased to that of the number of monophone states, and this model will still be able to retain the center phone capturing property of the state-specific vectors in basic Phone- CAT.
Abstract
Phone-cluster adaptive training (Phone-CAT) is a subspace based acoustic modeling technique inspired from cluster adaptive training (CAT) and subspace Gaussian mixture model (SGMM). This paper explores three extensions, viz., increasing phonetic subspace dimension, including sub-states and speaker subspace, to the basic Phone-CAT model to improve its recognition performance. The latter two extensions are similar in implementation as that of SGMM as both acoustic models share a similar subspace framework. But, since the phonetic subspace dimension of Phone-CAT is constrained to be equal to the number of monophones, the first extension is not straightforward to implement. We propose a Two-stage Phone-CAT model where we increase the phonetic subspace dimension to that of the number of monophone states. This model will still be able to retain the center phone capturing property of the state-specific vectors in basic Phone-CAT. Experiments done on 33-hour train subset of Switchboard database shows improvements in recognition performance of basic Phone-CAT model with the inclusion of the proposed extensions.

read more

References
More filters
Proceedings Article

The Kaldi Speech Recognition Toolkit

TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Journal ArticleDOI

A Study of Interspeaker Variability in Speaker Verification

TL;DR: It is shown that when a large joint factor analysis model is trained in this way and tested on the core condition, the extended data condition and the cross-channel condition, it is capable of performing at least as well as fusions of multiple systems of other types.
Journal ArticleDOI

Cluster adaptive training of hidden Markov models

TL;DR: This paper examines an adaptation scheme requiring very few parameters, cluster adaptive training (CAT), which may be viewed as a simple extension to speaker clustering, a linear interpolation of all the cluster means is used as the mean of the particular speaker.
Proceedings ArticleDOI

Subspace Gaussian Mixture Models for speech recognition

TL;DR: An acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space, and this style of acoustic model allows for a much more compact representation.
Related Papers (5)