Proceedings ArticleDOI
Lexicon-building methods for an acoustic sub-word based speech recognizer
Kuldip K. Paliwal
- Vol. 1990, pp 729-732
TLDR
The use of an acoustic subword unit (ASWU)-based speech recognition system for the recognition of isolated words is discussed and it is shown that the use of a modified k-means algorithm on the likelihoods derived through the Viterbi algorithm provides the best deterministic-type of word lexicon.Abstract:
The use of an acoustic subword unit (ASWU)-based speech recognition system for the recognition of isolated words is discussed. Some methods are proposed for generating the deterministic and the statistical types of word lexicon. It is shown that the use of a modified k-means algorithm on the likelihoods derived through the Viterbi algorithm provides the best deterministic-type of word lexicon. However, the ASWU-based speech recognizer leads to better performance with the statistical type of word lexicon than with the deterministic type. Improving the design of the word lexicon makes it possible to narrow the gap in the recognition performances of the whole word unit (WWU)-based and the ASWU-based speech recognizers considerably. Further improvements are expected by designing the word lexicon better. >read more
Citations
More filters
Moving beyond the 'beads-on-a-string' model of speech
TL;DR: Problems with the phoneme as the basic subword unit in speech recognition are raised, suggesting that finer-grained control is needed to capture the sort of pronunciation variability observed in spontaneous speech.
Journal ArticleDOI
A method for the construction of acoustic Markov models for words
TL;DR: A method for combining phonetic and fenonic models is presented and results of experiments with speaker-dependent and speaker-independent models on several isolated-word recognition tasks are reported.
Journal ArticleDOI
Joint lexicon, acoustic unit inventory and model design
Michiel Bacchiani,Mari Ostendorf +1 more
TL;DR: A joint solution to the related problems of learning a unit inventory and corresponding lexicon from data on a speaker-independent read speech task with a 1k vocabulary, the proposed algorithm outperforms phone-based systems at both high and low complexities.
Journal ArticleDOI
Maximum likelihood modelling of pronunciation variation
Trym Holter,Torbjørn Svendsen +1 more
TL;DR: A maximum likelihood based algorithm for fully automatic data-driven modelling of pronunciation, given a set of subword hidden Markov models (HMMs) and acoustic tokens of a word to create a consistent framework for optimisation of automatic speech recognition systems.
Proceedings Article
Joint Learning of Phonetic Units and Word Pronunciations for ASR
TL;DR: An unsupervised alternative ‐ requiring no language-specific knowledge ‐ to the conventional manual approach for creating pronunciation dictionaries is proposed, which jointly discovers the phonetic inventory and the Letter-to-Sound mapping rules in a language using only transcribed data.
References
More filters
Journal ArticleDOI
High performance connected digit recognition using hidden Markov models
TL;DR: An enhanced analysis feature set consisting of both instantaneous and transitional spectral information is used and the hidden-Markov-model (HMM)-based connected-digit recognizer in speaker-trained, multispeaker, and speaker-independent modes is tested.
Proceedings ArticleDOI
On the automatic segmentation of speech signals
Torbjørn Svendsen,F. Soong +1 more
TL;DR: Three different approaches for automatically segmenting speech into phonetic units are described, onebased on template matching, one based on detecting the spectral changes that occur at the boundaries between phoneticunits and one based upon a constrained-clustering vector quantization approach.
Proceedings ArticleDOI
A segment model based approach to speech recognition
TL;DR: The proposed segment model was tested on a speaker-trained, isolated word, speech recognition task with a vocabulary of 1109 basic English words and the average word recognition accuracy was 85% and increased to 96% and 98% for the top 3 and top 5 candidates, respectively.
Proceedings ArticleDOI
Word recognition using whole word and subword models
TL;DR: A unified framework is discussed which can be used to accomplish the goal of creating effective basic models of speech and points out the relative advantages of each type of speech unit based on the results of a series of recognition experiments.
Journal ArticleDOI
A model-based connected-digit recognition system using either hidden Markov models or templates
TL;DR: A unified system for automatically recognizing fluently spoken digit strings based on whole-word reference units is presented, which can use either hidden Markov model (HMM) technology or template-based technology and contains features from both approaches.