Search or ask a question

Showing papers by "Dong Yu published in 2004"

PDF

Open Access

Patent•DOI•

Quantitative model for formant dynamics and contextually assimilated reduction in fluent speech

[...]

Li Deng, Alejandro Acero, Dong Yu

17 Sep 2004

TL;DR: In this article, a method of identifying a sequence of formant trajectory values is provided in which the target values and the duration for each segment target for the formant are applied to a finite impulse response filter.

...read moreread less

Abstract: A method of identifying a sequence of formant trajectory values is provided in which a sequence of target values are identified for a formant as step functions. The target values and the duration for each segment target for the formant are applied to a finite impulse response filter to form a sequence of formant trajectory values. The parameters of this filter, as well as the duration of the targets for each phone, can be modified to produce many kinds of target undershooting effects in a contextually assimilated manner. The procedure for producing the formant trajectory values does not require any acoustic data from speech.

...read moreread less

24 citations

Proceedings Article•DOI•

Unsupervised learning from users' error correction in speech dictation.

[...]

Dong Yu¹, Mei-Yuh Hwang¹, Peter K. L. Mau¹, Alex Acero¹, Li Deng² - Show less +1 more•Institutions (2)

Microsoft¹, University of Washington²

04 Oct 2004

TL;DR: An enhanced two-pass pronunciation learning algorithm is introduced that utilizes the output from both an ngram phoneme recognizer and a Letter-to-Sound component to adapt automatic speech recognition systems used in dictation systems through unsupervised learning from users’ error correction.

...read moreread less

Abstract: We propose an approach to adapting automatic speech recognition systems used in dictation systems through unsupervised learning from users’ error correction. Three steps are involved in the adaptation: 1) infer whether the user is correcting a speech recognition error or simply editing the text, 2) infer what the most possible cause of the error is, and 3) adapt the system accordingly. To adapt the system effectively, we introduce an enhanced two-pass pronunciation learning algorithm that utilizes the output from both an ngram phoneme recognizer and a Letter-to-Sound component. Our experiments show that we can obtain greater than 10% relative word error rate reduction using the approaches we proposed. Learning new words gives the largest performance gain while adapting pronunciations and using a cache language model also produce a small gain.

...read moreread less

14 citations

Novel Acoustic Modeling with Structured Hidden Dynamics for Speech Coarticulation and Reduction

[...]

Li Deng¹, Dong Yu, Alex Acero, Xiaolong Li•Institutions (1)

Microsoft¹

01 Nov 2004

TL;DR: Recent progress on the new development, implementation, and evaluation of the structured speech model with statistically characterized hidden trajectories, offering significantly more power in parsimonious modeling of long-span context dependency is reported.

...read moreread less

Abstract: We report in this paper our recent progress on the new development, implementation, and evaluation of the structured speech model with statistically characterized hidden trajectories. Unidirectionality in coarticulation modeling in such hidden trajectory models as presented in previous EARS workshops has been extended to bi-directionality (forward as well as backward in the temporal dimension), offering significantly more power in parsimonious modeling of long-span context dependency. This new type of model, when appropriately implemented, also simultaneously exhibits the property of contextually assimilated phonetic reduction or phonetic target undershooting that is prevalent in casual, fluent speech (e.g., conversational speech). Experiments on large-scale N-best rescoring (N=1000) have demonstrated substantially lower phone recognition errors achieved by the model compared with a context-dependent (triphone) HMM system built with HTK. When the “error propagation” effect of the long-span acoustic model is artificially removed in the N-best rescoring paradigm (via adding the reference hypotheses into the 1000-best list), the error rate is further cut down in a dramatic manner.

...read moreread less

1 citations