scispace - formally typeset
Search or ask a question

Showing papers in "Computer Speech & Language in 1999"


Journal ArticleDOI
TL;DR: This work surveys the most widely-used algorithms for smoothing models for language n -gram modeling, and presents an extensive empirical comparison of several of these smoothing techniques, including those described by Jelinek and Mercer (1980), and introduces methodologies for analyzing smoothing algorithm efficacy in detail.

1,948 citations


Journal ArticleDOI
TL;DR: Performance benefits have been demonstrated from incorporating a linear trajectory description and additionally from modelling variability in the mid-point parameter, and theoretical and experimental comparisons between different types of PTSHMMs, simpler SHMMs and conventional HMMs are presented.

108 citations


Journal ArticleDOI
TL;DR: Four representative approaches to automatic phonemization on the same test dictionary are compared, with best translation results obtained with PbA at approximately 72% words correct on a resonably large pronouncing dictionary, indicating that automatic pronunciation of text is not a solved problem.

75 citations


Journal ArticleDOI
TL;DR: A set of cross-word decision-tree state-clustered context-dependent hidden Markov models are used to define a set of subphone units to be used in a concatenation synthesizer, which produces speech which is both natural sounding and highly intelligible.

69 citations


Journal ArticleDOI
TL;DR: Way in which to quantify the performance of confidence measures in terms of their discrimination power and bias is discussed and two different performance metrics are analyzed: the classification equal error rate and the normalized mutual information metric.

65 citations


Journal ArticleDOI
TL;DR: A set of related confidence measures for large vocabulary continuous speech recognition (LVCSR) based on local phone posterior probability estimates output by an acceptor HMM acoustic model are introduced and it is argued that acoustic confidence measures may be used to inform the search for improved pronunciation models.

50 citations


Journal ArticleDOI
TL;DR: The similarity weighting approach gives a 3?5% reduction in word error rate over a domain-specific n -gram language model, providing some of the largest language modeling gains reported for the Switchboard task in recent years.

43 citations


Journal ArticleDOI
TL;DR: A method allowing the two approaches to be combined within a backoff framework is presented, and it is demonstrated that this technique greatly improves language model perplexities for sparse training sets, and offers significantly improved size vs. performance tradeoffs when compared with standard trigram models.

33 citations


Journal ArticleDOI
TL;DR: This paper proposes a new language modeling approach to capture the preferred relationships between words over a short or long distance through the concept of MI-Trigger pairs and finds that the MI- Trigger-based modeling has better performance than word bigram modeling.

25 citations


Journal ArticleDOI
TL;DR: An approach of automatic selection of phonetically distributed sentence sets for speaker adaptation is presented, and the concept is applied to the task of Mandarin speech recognition with very large vocabulary, both in isolated syllable and continuous speech modes.

24 citations


Journal ArticleDOI
TL;DR: An overview of the German version of the Bell Labs text-to-speech system, a high-quality concatenative synthesis system with extensive text analysis capabilities, is presented.

Journal ArticleDOI
TL;DR: In this paper, a family of maximum likelihood (ML) techniques that aim at reducing an acoustic mismatch between the training and testing conditions of hidden Markov model (HMM)-based automatic speech recognition (ASR) systems is presented.

Journal ArticleDOI
TL;DR: A new approach using fuzzy implication was used to design a consonant/vowel segmentation method with a high accuracy rate and robustness to background noise for Mandarin syllable recognition systems.

Journal ArticleDOI
TL;DR: A POS (part-of-speech)-dependent multiple pronunciation dictionary generation method using HMM-state confusions spanning several phonemes that makes it possible to recover missing words that are lost during the first pass of the search process in continuous speech recognition using a single pronunciation dictionary.

Journal ArticleDOI
TL;DR: Model simulation experiments demonstrate that the auditory rate-place code constructed at the output of the network model is capable of reliable representation, with possible modification and/or enhancement, of the prominent spectral characteristics of the utterances displayed in wideband spectrograms.

Journal ArticleDOI
TL;DR: The portability of a stochastic semantic analyser from a setting of human–machine interactions air travel information services and multimodal multimedia automated service kiosk into the more open one of human-to-human interactions (ESST) is investigated.

Journal ArticleDOI
TL;DR: Three techniques are presented to reduce the time required to perform the word-spotting search: approximation of the full keyword plus filler recognition pass using the pre-computed Viterbi filler hypothesis; restricting the search space by dynamically matching the KPS against the filler path; and Gaussian Selection.

Journal ArticleDOI
TL;DR: A new statistical framework, derived from Bayesian statistics, is introduced to perform a triphone model from less context-dependent models, based on the mixture-Gaussian hidden Markov models (HMMs) incorporating state-level parameter tying.

Journal ArticleDOI
TL;DR: Two new temporal features for robust processing of speech signals with emphasis on microphone variations are presented, including a parametrized temporal filter and the RASTA features are inferior to the PTF features both in quiet conditions and in the presence of microphone variations.