Showing papers in &quot;Computer Speech &amp; Language in 1999&quot;

Probabilistic-trajectory segmental HMMs☆

TL;DR: This work surveys the most widely-used algorithms for smoothing models for language n -gram modeling, and presents an extensive empirical comparison of several of these smoothing techniques, including those described by Jelinek and Mercer (1980), and introduces methodologies for analyzing smoothing algorithm efficacy in detail.

...read moreread less

1,948 citations

Journal Article•DOI•

[...]

Wendy J. Holmes¹, Martin J. Russell¹•Institutions (1)

University of St Andrews¹

Evaluating the pronunciation component of text-to-speech systems for English: a performance comparison of different approaches

TL;DR: Performance benefits have been demonstrated from incorporating a linear trajectory description and additionally from modelling variability in the mid-point parameter, and theoretical and experimental comparisons between different types of PTSHMMs, simpler SHMMs and conventional HMMs are presented.

...read moreread less

108 citations

Journal Article•DOI•

[...]

Robert I. Damper¹, Robert I. Damper², Yannick Marchand¹, M.J. Adamson¹, Kjell Gustafson³ - Show less +1 more•Institutions (3)

University of Southampton¹, Oregon Health & Science University², Royal Institute of Technology³

A hidden Markov-model-based trainable speech synthesizer

TL;DR: Four representative approaches to automatic phonemization on the same test dictionary are compared, with best translation results obtained with PbA at approximately 72% words correct on a resonably large pronouncing dictionary, indicating that automatic pronunciation of text is not a solved problem.

...read moreread less

75 citations

Journal Article•DOI•

[...]

Robert E. Donovan¹, Philip C. Woodland¹•Institutions (1)

University of Cambridge¹

Evaluation of word confidence for speech recognition systems

TL;DR: A set of cross-word decision-tree state-clustered context-dependent hidden Markov models are used to define a set of subphone units to be used in a concatenation synthesizer, which produces speech which is both natural sounding and highly intelligible.

...read moreread less

69 citations

Journal Article•DOI•

[...]

Man-Hung Siu¹, Herbert Gish¹•Institutions (1)

BBN Technologies¹

Confidence measures from local posterior probability estimates

TL;DR: Way in which to quantify the performance of confidence measures in terms of their discrimination power and bias is discussed and two different performance metrics are analyzed: the classification equal error rate and the normalized mutual information metric.

...read moreread less

65 citations

Journal Article•DOI•

[...]

Gethin Williams¹, Steve Renals¹•Institutions (1)

University of Sheffield¹

Relevance weighting for combining multi-domain data for n-gram language modeling

TL;DR: A set of related confidence measures for large vocabulary continuous speech recognition (LVCSR) based on local phone posterior probability estimates output by an acceptor HMM acoustic model are introduced and it is argued that acoustic confidence measures may be used to inform the search for improved pronunciation models.

...read moreread less

50 citations

Journal Article•DOI•

[...]

Rukmini Iyer¹, Mari Ostendorf¹•Institutions (1)

Boston University¹

Variable-length categoryn-gram language models

TL;DR: The similarity weighting approach gives a 3?5% reduction in word error rate over a domain-specific n -gram language model, providing some of the largest language modeling gains reported for the Switchboard task in recent years.

...read moreread less

43 citations

Journal Article•DOI•

[...]

Thomas Niesler¹, Philip C. Woodland¹•Institutions (1)

University of Cambridge¹

Interpolation of n-gram and mutual-information based trigger pair language models for Mandarin speech recognition

TL;DR: A method allowing the two approaches to be combined within a backoff framework is presented, and it is demonstrated that this technique greatly improves language model perplexities for sparse training sets, and offers significantly improved size vs. performance tradeoffs when compared with standard trigram models.

...read moreread less

33 citations

Journal Article•DOI•

[...]

Zhou Guodong¹, Lua Kimteng¹•Institutions (1)

National University of Singapore¹

Automatic selection of phonetically distributed sentence sets for speaker adaptation with application to large vocabulary Mandarin speech recognition

TL;DR: This paper proposes a new language modeling approach to capture the preferred relationships between words over a short or long distance through the concept of MI-Trigger pairs and finds that the MI- Trigger-based modeling has better performance than word bigram modeling.

...read moreread less

25 citations

Journal Article•DOI•

[...]

Jia-Lin Shen¹, Hsin-Min Wang¹, Ren-Yuan Lyu², Lin-Shan Lee•Institutions (2)

Academia Sinica¹, National Taiwan University²

The Bell Labs German text-to-speech system

TL;DR: An approach of automatic selection of phonetically distributed sentence sets for speaker adaptation is presented, and the concept is applied to the task of Mandarin speech recognition with very large vocabulary, both in isolated syllable and continuous speech modes.

...read moreread less

24 citations

Journal Article•DOI•

[...]

Bernd Möbius¹•Institutions (1)

University of Stuttgart¹

Integrated bias removal techniques for robust speech recognition

TL;DR: An overview of the German version of the Bell Labs text-to-speech system, a high-quality concatenative synthesis system with extensive text analysis capabilities, is presented.

...read moreread less

Journal Article•DOI•

[...]

Craig T. Lawrence¹, Mazin G. Rahim²•Institutions (2)

University of Maryland, College Park¹, AT&T Labs²

Consonant/vowel segmentation for Mandarin syllable recognition

TL;DR: In this paper, a family of maximum likelihood (ML) techniques that aim at reducing an acoustic mismatch between the training and testing conditions of hidden Markov model (HMM)-based automatic speech recognition (ASR) systems is presented.

...read moreread less

Journal Article•DOI•

[...]

M.-T. Lin¹, Ching-Kuen Lee¹, Chin-Yi Lin¹•Institutions (1)

Tatung University¹

Multiple pronunciation dictionary using HMM-state confusion characteristics

TL;DR: A new approach using fuzzy implication was used to design a consonant/vowel segmentation method with a high accuracy rate and robustness to background noise for Mandarin syllable recognition systems.

...read moreread less

Journal Article•DOI•

[...]

Yumi Wakita¹, Harald Singer, Yoshinori Sagisaka•Institutions (1)

Panasonic¹

A layered neural network interfaced with a cochlear model for the study of speech encoding in the auditory system

TL;DR: A POS (part-of-speech)-dependent multiple pronunciation dictionary generation method using HMM-state confusions spanning several phonemes that makes it possible to recover missing words that are lost during the first pass of the search process in continuous speech recognition using a single pronunciation dictionary.

...read moreread less

Journal Article•DOI•

[...]

Hamid Sheikhzadeh¹, Li Deng¹•Institutions (1)

University of Waterloo¹

Stochastically-based semantic analysis for machine translation

TL;DR: Model simulation experiments demonstrate that the auditory rate-place code constructed at the output of the network model is capable of reliable representation, with possible modification and/or enhancement, of the prominent spectral characteristics of the utterances displayed in wideband spectrograms.

...read moreread less

Journal Article•DOI•

[...]

Wolfgang Minker¹, Marsal Gavalda², Alex Waibel³, Alex Waibel²•Institutions (3)

Centre national de la recherche scientifique¹, Carnegie Mellon University², Karlsruhe Institute of Technology³

Low-cost implementation of open set keyword spotting

TL;DR: The portability of a stochastic semantic analyser from a setting of human–machine interactions air travel information services and multimodal multimedia automated service kiosk into the more open one of human-to-human interactions (ESST) is investigated.

...read moreread less

Journal Article•DOI•

[...]

Kate Knill¹, Steve Young¹•Institutions (1)

University of Cambridge¹

A Bayesian triphone model

TL;DR: Three techniques are presented to reduce the time required to perform the word-spotting search: approximation of the full keyword plus filler recognition pass using the pre-computed Viterbi filler hypothesis; restricting the search space by dynamically matching the KPS against the filler path; and Gaussian Selection.

...read moreread less

Journal Article•DOI•

[...]

Ji Ming¹, Francis Jack Smith¹•Institutions (1)

Queen's University Belfast¹

New temporal features for robust speech recognition with emphasis on microphone variations

TL;DR: A new statistical framework, derived from Bayesian statistics, is introduced to perform a triphone model from less context-dependent models, based on the mixture-Gaussian hidden Markov models (HMMs) incorporating state-level parameter tying.

...read moreread less

Journal Article•DOI•

[...]

Jia-Lin Shen¹, Wen-Liang Hwang¹•Institutions (1)

Academia Sinica¹