Showing papers by "Dan Jurafsky published in 2003"

PDF

Open Access

Journal Article•DOI•

Effects of disfluencies, predictability, and utterance position on word form variation in English conversation

[...]

Alan Bell¹, Dan Jurafsky, Eric Fosler-Lussier, Cynthia Girand, Michelle L. Gregory, Daniel Gildea - Show less +2 more•Institutions (1)

University of Colorado Boulder¹

28 Jan 2003-Journal of the Acoustical Society of America

TL;DR: This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation or a more reduced or lenited pronunciation, based on over 8000 occurrences of the ten most frequent English function words in a 4-h sample from conversations from the Switchboard corpus.

...read moreread less

Abstract: Function words, especially frequently occurring ones such as (the, that, and, and of ), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation (e.g., ði, ðaet, aend, ʌv) or a more reduced or lenited pronunciation (e.g., ðə, ðīt, n, ə). It is based on over 8000 occurrences of the ten most frequent English function words in a 4-h sample from conversations from the Switchboard corpus. Ordinary linear and logistic regression models were used to examine variation in the length of the words, in the form of their vowel (basic, full, or reduced), and whether final obstruents were present or not. For all these measures, after controlling for segmental context, rate of speech, and other important factors, there are strong independent effects that made high-frequency monosyllabic function words more likely to be longer or have a fuller form (1) when neighboring disfluencies (such as filled pauses uh and um) indicate that the speaker was encountering problems in planning the utterance; (2) when the word is unexpected, i.e., less predictable in context; (3) when the word is either utterance initial or utterance final. Looking at the phenomenon in a different way, frequent function words are more likely to be shorter and to have less-full forms in fluent speech, in predictable positions or multiword collocations, and utterance internally. Also considered are other factors such as sex (women are more likely to use fuller forms, even after controlling for rate of speech, for example), and some of the differences among the ten function words in their response to the factors.

...read moreread less

383 citations

Proceedings Article•DOI•

Semantic role parsing: adding semantic structure to unstructured text

[...]

Sameer Pradhan¹, Kadri Hacioglu¹, Wayne H. Ward¹, James Martin¹, Dan Jurafsky¹ - Show less +1 more•Institutions (1)

University of Colorado Boulder¹

19 Nov 2003

TL;DR: The authors formulate the semantic parsing problem as a classification problem using support vector machines and use a hand-labeled training set and a set of features drawn from earlier work together with some feature enhancements.

...read moreread less

Abstract: There is an ever-growing need to add structure in the form of semantic markup to the huge amounts of unstructured text data now available. We present the technique of shallow semantic parsing, the process of assigning a simple WHO did WHAT to WHOM, etc., structure to sentences in text, as a useful tool in achieving this goal. We formulate the semantic parsing problem as a classification problem using support vector machines. Using a hand-labeled training set and a set of features drawn from earlier work together with some feature enhancements, we demonstrate a system that performs better than all other published results on shallow semantic parsing.

...read moreread less

92 citations

Journal Article•DOI•

Syntactic frame and verb bias in aphasia: plausibility judgments of undergoer-subject sentences

[...]

Susanne Gahl¹, Lise Menn², Gail Ramsberger², Dan Jurafsky², Elizabeth Elder², Molly Rewega³, L. Holland Audrey³ - Show less +3 more•Institutions (3)

Harvard University¹, University of Colorado Boulder², University of Arizona³

01 Nov 2003-Brain and Cognition

TL;DR: This study investigates three factors that have been argued to define "canonical form" in sentence comprehension: Syntactic structure, semantic role, and frequency of usage, and shows that sentences whose structure matches the lexical bias of the main verb are significantly easier than sentences in which structure and lexical biases do not match.

...read moreread less

33 citations

Issues in recognition of spanish-accented spontaneous english

[...]

A Ikeno, Bryan L. Pellom, Daniel M. Cer, Ashley Thornton, Jason Brenier, Dan Jurafsky, Wayne H. Ward, William Byrne - Show less +4 more

16 Apr 2003

TL;DR: The crucial importance of training on the Hispanicaccented data for acoustic model performance is shown, and the tendency of Spanish-acented speakers to use longer, and presumably less-reduced, schwa vowels than native-English speakers is described.

...read moreread less

Abstract: We describe a recognition experiment and two analytic experiments on a database of strongly Hispanic-accented English. We show the crucial importance of training on the Hispanicaccented data for acoustic model performance, and describe the tendency of Spanish-accented speakers to use longer, and presumably less-reduced, schwa vowels than native-English speakers.

...read moreread less

19 citations

Proceedings Article•DOI•

The Effect of Rhythm on Structural Disambiguation in Chinese

[...]

Honglin Sun¹, Dan Jurafsky¹•Institutions (1)

University of Colorado Boulder¹

11 Jul 2003

TL;DR: This paper systematically surveys the distribution of rhythm in constructions in Chinese from the statistical data acquired from a shallow tree bank, and shows that using the probabilistic rhythm feature significantly improves the performance of the shallow parser.

...read moreread less

Abstract: The length of a constituent (number of syllables in a word or number of words in a phrase), or rhythm, plays an important role in Chinese syntax. This paper systematically surveys the distribution of rhythm in constructions in Chinese from the statistical data acquired from a shallow tree bank. Based on our survey, we then used the rhythm feature in a practical shallow parsing task by using rhythm as a statistical feature to augment a PCFG model. Our results show that using the probabilistic rhythm feature significantly improves the performance of our shallow parser.

...read moreread less

10 citations

Journal Article•DOI•

Beyond canonical form: verb-frame frequency affects verb production and comprehension

[...]

Lise Menn, Susanne Gahl, Audrey Holland, Gail Ramsberger, Dan Jurafsky - Show less +1 more

01 Oct 2003-Brain and Language

TL;DR: This paper found that the unaccusative verb frame (the apple dropped) is no more difficult than the intransitive with agent subject (Gottfried, Menn, & Holland, 1997, using repetition; Gahl, Mern, Ramsberger, Jurafsky, Elder, & Rewega et al., 2001, using plausibility judgement).

...read moreread less

2 citations

Book•

Identifying semantic relations in text

[...]

Daniel Gildea¹, Dan Jurafsky²•Institutions (2)

International Computer Science Institute¹, University of Colorado Boulder²

01 Jan 2003

TL;DR: This work presents a statistical system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence, based on statistical classifiers trained on roughly 50,000 sentences hand labeled with semantic roles in the FrameNet semantic labeling project.

...read moreread less

Abstract: Over the past decade, natural language processing has been transformed by the adoption of statistical methods. The statistical approach began with shallow problems such as part-of-speech tagging, progressed to syntactic parsing, and is now being applied to higher-level semantic tasks. We present a statistical system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence. The system operates at the level of frame semantics, which provide us with an intermediate representation between the detail of complete theories of semantics and simpler domain-specific slot-filler representations. Given an input sentence, the system labels constituents with roles such as SPEAKER, MESSAGE, and TOPIC, identifying participants in various types of actions or states.The system is based on statistical classifiers that were trained on roughly 50,000 sentences hand labeled with semantic roles in the FrameNet semantic labeling project. We then parsed each training sentence and extracted various lexical and syntactic features, including the syntactic category of the constituent, its grammatical function, and position in the sentence. These features were combined with knowledge of the target verb, noun, or adjective: as well as information such as the prior probabilities of various combinations of semantic roles. We also used various methods of lexical clustering to generalize across possible fillers of roles. Test sentences were parsed, annotated with these features, and then passed through the classifiers.Our system achieves 80% accuracy in identifying the semantic role of presegmented constituents. At the harder task of simultaneously segmenting constituents and identifying their semantic role, the system achieved 65% precision and 61% recall.

...read moreread less

1 citations