Automatic generation of subword units for speech recognition systems

doi:10.1109/89.985546

Journal ArticleDOI

Automatic generation of subword units for speech recognition systems

Rita Singh, +2 more

- 07 Aug 2002 -

IEEE Transactions on Speech and Audio Pr...

- Vol. 10, Iss: 2, pp 89-99

Chats0

TLDR

This paper presents a complete probabilistic formulation for the automatic design of subword units and dictionary, given only the acoustic data and their transcriptions, and permits easy incorporation of external sources of information, such as the spellings of words in terms of a nonideographic script.

Abstract:

Large vocabulary continuous speech recognition (LVCSR) systems traditionally represent words in terms of smaller subword units. Both during training and during recognition, they require a mapping table, called the dictionary, which maps words into sequences of these subword units. The performance of the LVCSR system depends critically on the definition of the subword units and the accuracy of the dictionary. In current LVCSR systems, both these components are manually designed. While manually designed subword units generalize well, they may not be the optimal units of classification for the specific task or environment for which an LVCSR system is trained. Moreover, when human expertise is not available, it may not be possible to design good subword units manually. There is clearly a need for data-driven design of these LVCSR components. In this paper, we present a complete probabilistic formulation for the automatic design of subword units and dictionary, given only the acoustic data and their transcriptions. The proposed framework permits easy incorporation of external sources of information, such as the spellings of words in terms of a nonideographic script.

Automatic generation of subword units for speech recognition systems

Citations

Speech Recognition by Machine, A Review

Grapheme Based Speech Recognition

An auto-encoder based approach to unsupervised learning of subword units

Towards Unsupervised Training of Speaker Independent Acoustic Models.

A new independent component analysis for speech recognition and separation

References

Maximum likelihood from incomplete data via the EM algorithm

Introduction to Automata Theory, Languages, and Computation

Fundamentals of speech recognition

The mathematics of statistical machine translation: parameter estimation

Estimation of probabilities from sparse data for the language model component of a speech recognizer

Related Papers (5)

Fundamentals of speech recognition

A tutorial on hidden Markov models and selected applications in speech recognition

Maximum likelihood from incomplete data via the EM algorithm

A Nonparametric Bayesian Approach to Acoustic Model Discovery

The Kaldi Speech Recognition Toolkit