Modeling the prosody of hidden events for improved word recognition.

Open AccessProceedings Article

Modeling the prosody of hidden events for improved word recognition.

TLDR

A new approach to penalize word hypotheses that are inconsistent with prosodic features such as duration and pitch is investigated, and the language model is modified to represent hidden events such as sentence boundaries and various forms of disfluency.

Abstract:

We investigate a new approach for using speech prosody as a knowledge source for speech recognition. The idea is to penalize word hypotheses that are inconsistent with prosodic features such as duration and pitch. To model the interaction between words and prosody we modify the language model to represent hidden events such as sentence boundaries and various forms of disfluency, and combine with it decision trees that predict such events from prosodic features. N-best rescoring experiments on the Switchboard corpus show a small but consistent reduction of word error as a result of this modeling. We conclude with a preliminary analysis of the types of errors that are corrected by the prosodically informed model.

Citations

PDF

Open Access

More filters

Proceedings Article

SRILM – An Extensible Language Modeling Toolkit

Andreas Stolcke

TL;DR: The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools.

...read moreread less

Proceedings ArticleDOI

Spontaneous speech: how people really talk and why engineers should care.

Elizabeth Shriberg

TL;DR: An overview of four fundamental properties of spontaneous speech that present challenges for spoken language applications because they violate assumptions often applied in automatic processing technology are described.

...read moreread less

Automatic detection and classification of prosodic events

Andrew Rosenberg

TL;DR: This thesis describes work on the automatic detection and classification of prosodic events – specifically, pitch accents and prosodic phrase boundaries, and presents three proof-of-concept applications showing that access to hypothesized prosodic event information can be used to improve the performance of downstream spoken language processing tasks.

...read moreread less

Book ChapterDOI

Prosody Modeling for Automatic Speech Recognition and Understanding

Elizabeth Shriberg, +1 more

TL;DR: A number of applications of the prosody framework are surveyed, and results for automatic sentence segmentation and disfluency detection, topic segmentation, dialog act labeling, and word recognition are given.

...read moreread less

Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing

E. Shriberg, +1 more

TL;DR: This work describes a “direct modeling” approach to using prosody in various speech technology tasks that does not involve any hand-labeling or modeling of prosodic events such as pitch accents or boundary tones, and focuses on spontaneous speech from a variety of contexts.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Classification and Regression Trees.

John Van Ryzin, +4 more

- 01 Mar 1986 -

Journal of the American Statistical Asso...

Proceedings ArticleDOI

SWITCHBOARD: telephone speech corpus for research and development

J.J. Godfrey, +2 more

TL;DR: SWITCHBOARD as mentioned in this paper is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition.

...read moreread less

Journal ArticleDOI

A Maximum Likelihood Approach to Continuous Speech Recognition

Lalit R. Bahl, +2 more

- 01 Feb 1983 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This paper describes a number of statistical models for use in speech recognition, with special attention to determining the parameters for such models from sparse data, and describes two decoding methods appropriate for constrained artificial languages and one appropriate for more realistic decoding tasks.

...read moreread less

Detection and Correction of Repairs in Human-Computer Dialog

John Bear, +2 more

TL;DR: In this article, the authors present criteria and techniques for automatically detecting the presence of a repair, its location, and making the appropriate correction, which involve integration of knowledge from several sources: pattern matching, syntactic and semantic analysis, and acoustics.

...read moreread less

Proceedings ArticleDOI

Integrating multiple knowledge sources for detection and correction of repairs in human-computer dialog

John Bear, +2 more

TL;DR: The authors present criteria and techniques for automatically detecting the presence of a repair, its location, and making the appropriate correction, and preliminary results show that pattern matching is effective at detecting repairs without excessive overgeneration.

...read moreread less