Showing papers by "Dan Jurafsky published in 2000"

PDF

Open Access

Book•

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

[...]

Dan Jurafsky, James Martin

01 Jan 2000

TL;DR: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora, to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation.

...read moreread less

Abstract: From the Publisher: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora.Methodology boxes are included in each chapter. Each chapter is built around one or more worked examples to demonstrate the main idea of the chapter. Covers the fundamental algorithms of various fields, whether originally proposed for spoken or written language to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation. Emphasis on web and other practical applications. Emphasis on scientific evaluation. Useful as a reference for professionals in any of the areas of speech and language processing.

...read moreread less

3,794 citations

Journal Article•DOI•

Dialogue act modeling for automatic tagging and recognition of conversational speech

[...]

Andreas Stolcke¹, Noah Coccaro², Rebecca Bates³, Paul Taylor⁴, Carol Van Ess-Dykema, Klaus Ries⁵, Elizabeth Shriberg¹, Dan Jurafsky², Rachel Martin⁶, Marie Meteer⁷ - Show less +6 more•Institutions (7)

SRI International¹, University of Colorado Boulder², University of Washington³, University of Edinburgh⁴, Carnegie Mellon University⁵, Johns Hopkins University⁶, BBN Technologies⁷

01 Sep 2000-Computational Linguistics

TL;DR: The authors proposed a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as STATEMENT, QUESTION, BACKCHANNEL, AGREEMENT, DISAGREEMENT and APOLOGY.

...read moreread less

Abstract: We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as STATEMENT, QUESTION, BACKCHANNEL, AGREEMENT, DISAGREEMENT, and APOLOGY. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.

...read moreread less

1,094 citations

Proceedings Article•DOI•

Automatic labeling of semantic roles

[...]

Daniel Gildea¹, Dan Jurafsky²•Institutions (2)

University of California, Berkeley¹, University of Colorado Boulder²

03 Oct 2000

TL;DR: This work presents a system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence within a semantic frame, derived from parse trees and hand-annotated training data.

...read moreread less

Abstract: We present a system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence within a semantic frame. Various lexical and syntactic features are derived from parse trees and used to derive statistical classifiers from hand-annotated training data.

...read moreread less

944 citations

Posted Content•

Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech

[...]

Elizabeth Shriberg, Rebecca Bates, Andreas Stolcke, Paul Taylor, Dan Jurafsky, Klaus Ries, Noah Coccaro, Rachel Martin, Marie Meteer, C. Van Ess-Dykema - Show less +6 more

11 Jun 2000-arXiv: Computation and Language

TL;DR: It is suggested that DAs are redundantly marked in natural conversation, and that a variety of automatically extractable prosodic features could aid dialog processing in speech applications.

...read moreread less

Abstract: Identifying whether an utterance is a statement, question, greeting, and so forth is integral to effective automatic understanding of natural dialog. Little is known, however, about how such dialog acts (DAs) can be automatically classified in truly natural conversation. This study asks whether current approaches, which use mainly word information, could be improved by adding prosodic information. The study is based on more than 1000 conversations from the Switchboard corpus. DAs were hand-annotated, and prosodic features (duration, pause, F0, energy, and speaking rate) were automatically extracted for each DA. In training, decision trees based on these features were inferred; trees were then applied to unseen test data to evaluate performance. Performance was evaluated for prosody models alone, and after combining the prosody models with word information -- either from true words or from the output of an automatic speech recognizer. For an overall classification task, as well as three subtasks, prosody made significant contributions to classification. Feature-specific analyses further revealed that although canonical features (such as F0 for questions) were important, less obvious features could compensate if canonical features were removed. Finally, in each task, integrating the prosodic model with a DA-specific statistical language model improved performance over that of the language model alone, especially for the case of recognized words. Results suggest that DAs are redundantly marked in natural conversation, and that a variety of automatically extractable prosodic features could aid dialog processing in speech applications.

...read moreread less

271 citations

Proceedings Article•DOI•

Knowledge-free induction of morphology using latent semantic analysis

[...]

Patrick Schone¹, Dan Jurafsky¹•Institutions (1)

University of Colorado Boulder¹

13 Sep 2000

TL;DR: A semantics-only algorithm for learning morphology which only proposes affixes when the stem and stem-plus-affix are sufficiently similar semantically and it is shown that this approach provides morphology induction results that rival a current state-of-the-art system.

...read moreread less

Abstract: Morphology induction is a subproblem of important tasks like automatic learning of machine-readable dictionaries and grammar induction. Previous morphology induction approaches have relied solely on statistics of hypothesized stems and affixes to choose which affixes to consider legitimate. Relying on stem-and-affix statistics rather than semantic knowledge leads to a number of problems, such as the inappropriate use of valid affixes ("ally" stemming to "all"). We introduce a semantic-based algorithm for learning morphology which only proposes affixes when the stem and stem-plus-affix are sufficiently similar semantically. We implement our approach using Latent Semantic Analysis and show that our semantics-only approach provides morphology induction results that rival a current state-of-the-art system.

...read moreread less

233 citations

Posted Content•

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

[...]

Andreas Stolcke, Klaus Ries, Noah Coccaro, Elizabeth Shriberg, Rebecca Bates, Dan Jurafsky, Paul Taylor, Rachel Martin, C. Van Ess-Dykema, Marie Meteer - Show less +6 more

11 Jun 2000-arXiv: Computation and Language

TL;DR: A probabilistic integration of speech recognition with dialogue modeling is developed, to improve both speech recognition and dialogue act classification accuracy.

...read moreread less

Abstract: We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.

...read moreread less

211 citations

Proceedings Article•DOI•

Verb Subcategorization Frequency Differences between Business- News and Balanced Corpora: The Role of Verb Sense

[...]

Douglas Roland¹, Dan Jurafsky¹, Lise Menn¹, Susanne Gahl², Elezabeth Elder¹, Chris Riddoch¹ - Show less +2 more•Institutions (2)

University of Colorado Boulder¹, Harvard University²

07 Oct 2000

TL;DR: The authors explored the differences in verb subcategorization frequencies across several corpora in an effort to obtain stable cross corpus subcategoriesization probabilities for use in norming psychological experiments.

...read moreread less

Abstract: We explore the differences in verb subcategorization frequencies across several corpora in an effort to obtain stable cross corpus subcategorization probabilities for use in norming psychological experiments For the 64 single sense verbs we looked at, subcategorization preferences were remarkably stable between British and American corpora, and between balanced corpora and financial news corpora Of the verbs that did show differences, these differences were generally found between the balanced corpora and the financial news data We show that all or nearly all of these shifts in subcategorization are realised via (often subtle) word sense differences This is an interesting observation in itself, and also suggests that stable cross corpus subcategorization frequencies may be found when verb sense is adequately controlled

...read moreread less

31 citations