scispace - formally typeset
Journal ArticleDOI

Automatic generation of subword units for speech recognition systems

Reads0
Chats0
TLDR
This paper presents a complete probabilistic formulation for the automatic design of subword units and dictionary, given only the acoustic data and their transcriptions, and permits easy incorporation of external sources of information, such as the spellings of words in terms of a nonideographic script.
Abstract
Large vocabulary continuous speech recognition (LVCSR) systems traditionally represent words in terms of smaller subword units. Both during training and during recognition, they require a mapping table, called the dictionary, which maps words into sequences of these subword units. The performance of the LVCSR system depends critically on the definition of the subword units and the accuracy of the dictionary. In current LVCSR systems, both these components are manually designed. While manually designed subword units generalize well, they may not be the optimal units of classification for the specific task or environment for which an LVCSR system is trained. Moreover, when human expertise is not available, it may not be possible to design good subword units manually. There is clearly a need for data-driven design of these LVCSR components. In this paper, we present a complete probabilistic formulation for the automatic design of subword units and dictionary, given only the acoustic data and their transcriptions. The proposed framework permits easy incorporation of external sources of information, such as the spellings of words in terms of a nonideographic script.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Speech Recognition by Machine, A Review

TL;DR: The objective of this review paper is to summarize and compare some of the well known methods used in various stages of speech recognition system and identify research topic and applications which are at the forefront of this exciting and challenging field.
Proceedings Article

Grapheme Based Speech Recognition

TL;DR: Grapheme based speech recognizers in three languages - English, German, and Spanish - are trained and compared to their phoneme based counterparts and the results show that for languages with a close grapheme-to-phoneme relation, graphe me based modeling is as good as the phonemebased one.
Proceedings ArticleDOI

An auto-encoder based approach to unsupervised learning of subword units

TL;DR: An auto encoder-based method for the unsupervised identification of subword units and the encoded representation of speech produced by standard auto encoders is more effective than Gaussian posteriorgrams in a spoken query classification task.
Proceedings Article

Towards Unsupervised Training of Speaker Independent Acoustic Models.

TL;DR: This paper investigates the feasibility of using the results of a number of recent efforts to automatically discover repeated spoken terms without a recognizer as constraints for unsupervised acoustic model training, and starts with a relatively small set of word types.
Journal ArticleDOI

A new independent component analysis for speech recognition and separation

TL;DR: A novel nonparametric likelihood ratio (NLR) objective function for independent component analysis (ICA) derived through the statistical hypothesis test of independence of random observations and applied for unsupervised learning of unknown pronunciation variations.
References
More filters
Book

Introduction to Automata Theory, Languages, and Computation

TL;DR: This book is a rigorous exposition of formal languages and models of computation, with an introduction to computational complexity, appropriate for upper-level computer science undergraduates who are comfortable with mathematical arguments.
Book

Fundamentals of speech recognition

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Journal Article

The mathematics of statistical machine translation: parameter estimation

TL;DR: The authors describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another.
Journal ArticleDOI

Estimation of probabilities from sparse data for the language model component of a speech recognizer

TL;DR: The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data, and compares favorably to other proposed methods.
Related Papers (5)