Journal ArticleDOI
Automatic generation of subword units for speech recognition systems
Reads0
Chats0
TLDR
This paper presents a complete probabilistic formulation for the automatic design of subword units and dictionary, given only the acoustic data and their transcriptions, and permits easy incorporation of external sources of information, such as the spellings of words in terms of a nonideographic script.Abstract:
Large vocabulary continuous speech recognition (LVCSR) systems traditionally represent words in terms of smaller subword units. Both during training and during recognition, they require a mapping table, called the dictionary, which maps words into sequences of these subword units. The performance of the LVCSR system depends critically on the definition of the subword units and the accuracy of the dictionary. In current LVCSR systems, both these components are manually designed. While manually designed subword units generalize well, they may not be the optimal units of classification for the specific task or environment for which an LVCSR system is trained. Moreover, when human expertise is not available, it may not be possible to design good subword units manually. There is clearly a need for data-driven design of these LVCSR components. In this paper, we present a complete probabilistic formulation for the automatic design of subword units and dictionary, given only the acoustic data and their transcriptions. The proposed framework permits easy incorporation of external sources of information, such as the spellings of words in terms of a nonideographic script.read more
Citations
More filters
Posted Content
Speech Recognition by Machine, A Review
M. A. Anusuya,S. K. Katti +1 more
TL;DR: The objective of this review paper is to summarize and compare some of the well known methods used in various stages of speech recognition system and identify research topic and applications which are at the forefront of this exciting and challenging field.
Proceedings Article
Grapheme Based Speech Recognition
TL;DR: Grapheme based speech recognizers in three languages - English, German, and Spanish - are trained and compared to their phoneme based counterparts and the results show that for languages with a close grapheme-to-phoneme relation, graphe me based modeling is as good as the phonemebased one.
Proceedings ArticleDOI
An auto-encoder based approach to unsupervised learning of subword units
TL;DR: An auto encoder-based method for the unsupervised identification of subword units and the encoded representation of speech produced by standard auto encoders is more effective than Gaussian posteriorgrams in a spoken query classification task.
Proceedings Article
Towards Unsupervised Training of Speaker Independent Acoustic Models.
Aren Jansen,Kenneth Church +1 more
TL;DR: This paper investigates the feasibility of using the results of a number of recent efforts to automatically discover repeated spoken terms without a recognizer as constraints for unsupervised acoustic model training, and starts with a relatively small set of word types.
Journal ArticleDOI
A new independent component analysis for speech recognition and separation
Jen-Tzung Chien,Bo-Cheng Chen +1 more
TL;DR: A novel nonparametric likelihood ratio (NLR) objective function for independent component analysis (ICA) derived through the statistical hypothesis test of independence of random observations and applied for unsupervised learning of unknown pronunciation variations.
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Book
Introduction to Automata Theory, Languages, and Computation
TL;DR: This book is a rigorous exposition of formal languages and models of computation, with an introduction to computational complexity, appropriate for upper-level computer science undergraduates who are comfortable with mathematical arguments.
Book
Fundamentals of speech recognition
TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Journal Article
The mathematics of statistical machine translation: parameter estimation
TL;DR: The authors describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another.
Journal ArticleDOI
Estimation of probabilities from sparse data for the language model component of a speech recognizer
TL;DR: The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data, and compares favorably to other proposed methods.