scispace - formally typeset
Search or ask a question

Showing papers on "Phrase published in 2003"


Proceedings ArticleDOI
27 May 2003
TL;DR: The empirical results suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translation.
Abstract: We propose a new phrase-based translation model and decoding algorithm that enables us to evaluate and compare several, previously proposed phrase-based translation models. Within our framework, we carry out a large number of experiments to understand better and explain why phrase-based models out-perform word-based models. Our empirical results, which hold for all examined language pairs, suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translations. Surprisingly, learning phrases longer than three words and learning phrases from high-accuracy word-level alignment models does not have a strong impact on performance. Learning only syntactically motivated phrases degrades the performance of our systems.

3,778 citations


Journal ArticleDOI
TL;DR: The authors explored the basis by which thematic dependencies can be evaluated in advance of linguistic input that unambiguously signals those dependencies, and found that verb-based information is not limited to anticipating the immediately following (grammatical) object, but can also anticipate later occurring objects.

795 citations


Proceedings ArticleDOI
05 Apr 2003
TL;DR: A collection of 500 phrases for evaluations of text entry methods using a pre-defined phrase set to achieve results that are generalizable and the possible addition of punctuation and other characters is described.
Abstract: In evaluations of text entry methods, participants enter phrases of text using a technique of interest while performance data are collected. This paper describes and publishes (via the internet) a collection of 500 phrases for such evaluations. Utility programs are also provided to compute statistical properties of the phrase set, or any other phrase set. The merits of using a pre-defined phrase set are described as are methodological considerations, such as attaining results that are generalizable and the possible addition of punctuation and other characters.

721 citations


Proceedings Article
01 Apr 2003
TL;DR: Though the result is little worse than the most up-to-date phrase structure based parsers, it looks satisfactorily accurate considering that the parser uses no information from phrase structures.
Abstract: In this paper, we propose a method for analyzing word-word dependencies using deterministic bottom-up manner using Support Vector machines. We experimented with dependency trees converted from Penn treebank data, and achieved over 90% accuracy of word-word dependency. Though the result is little worse than the most up-to-date phrase structure based parsers, it looks satisfactorily accurate considering that our parser uses no information from phrase structures.

700 citations


Dissertation
01 Jan 2003
TL;DR: The authors studied the effect of adposition stranding on successive cyclicity in natural languages and proposed a theory of mirror theory for downward head movement, which is based on a derivational model of the grammar.
Abstract: This thesis studies movement operations in natural languages. It is observed that certain heads – C° , v°, and, in most languages, P° – cannot be stranded; the complements of these heads never move without pied-piping the heads in question. This is surprising since (a) extraction out of CP, vP, and PP is possible in principle and (b) the complement categories of these heads, TP, VP, and DP or PP, are movable. Evidence for the more contentious of these claims is provided in chapters 3 and 4. Chapter 4 also investigates the ramifications of these facts for theories of adposition stranding. All heads in question have independently been argued to project what Chomsky (2000) calls ‘phases’. The generalization is that phase heads cannot be stranded. Chapter 2 derives the ban against stranding phase heads within a derivational model of the grammar. The effect of phases on successive cyclicity is the following: To be licit, movement out of a phase must pass through the specifier position of that phase. The idea of the account is that every step of movement must establish a relation between the moved item and some other element in the phrase marker which is in a well-defined sense closer than the relation they were in prior to movement. Movement from complement to specifier position within the same phrase never achieves this. In fact, any movement within the same phrase is in effect too short to achieve this. There are then well-defined anti-locality effects, which fallout from considerations of local economy. The ban against stranding phase heads now follows. A category can leave its containing phase only by passing through its specifier position. Since complements cannot reach the specifier position in the same phrase, the complements of phase heads cannot move away. Head Movement is prohibited by the same economy based reasoning. Chapter 5 focuses on Head Movement, advocating a version of Brody’s (2000) Mirror Theory. In contrast to standard theories of Head Movement, Mirror Theory predicts what looks like downward Head Movement to be possible. Data from VP-ellipsis in English show that this prediction of Mirror Theory is correct.

419 citations


01 Jan 2003
TL;DR: The ToBI system as mentioned in this paper is a consensus system for labelling spoken utterances that segregates tags for different types of phonological events and structures into parallel quasi-independent tiers, which are used to mark the phonologically contrastive intonational events (Tones) separately from the hierarchy of interword junctures (Break-Indices) with which some of these pitch events are associated.
Abstract: The ToBI conventions are a consensus system for labelling spoken utterances that segregates tags for different types of phonological events and structures into parallel quasi- independent tiers. Most notably, the conventions specify a way to mark the phonologically contrastive intonational events (Tones) separately from the hierarchy of inter-word junctures (Break-Indices) with which some of these pitch events are associated. The original ToBI conventions are language-specific; they were intended to cover the phonologically contrastive tones of Mainstream American English. However, other annotation conventions based on the same general design principles have now been proposed for several other English varieties and for a number of other languages. This function of the original ToBI system as a general model for developing language-specific annotation conventions makes it possible to compare prosodic systems across languages using a common vocabulary, and to search for universals. This chapter is an overview of the original ToBI system. It reviews the design of the original system and its foundations in basic and applied research. It describes the inter-disciplinary community of users and uses for which the system was intended, and it outlines how the consensus model of American English intonation and inter-word juncture was achieved by finding points of useful intersection among the research interests and knowledge embodied in this community. It thus identifies the practical principles for designing prosodic annotation conventions that emerged in the course of developing, testing, and using this particular system. p. 33 — Preprint draft of Chapter 2 of Sun-Ah Jun (ed.) (in press) Prosodic models and transcription: Towards prosodic typology. Oxford University Press. Please do not cite without permission of authors and editor. Figure 2.1. Audio waveform, F0 contour, and MAE_ToBI xlabel windows for utterance Okay... They have a couple flights. p. 34 — Preprint draft of Chapter 2 of Sun-Ah Jun (ed.) (in press) Prosodic models and transcription: Towards prosodic typology. Oxford University Press. Please do not cite without permission of authors and editor. Figure 2.2. Audio waveform, F0 contour, and MAE_ToBI xlabel windows for utterance The Pentagon reports fighting in six southern Iraqi cities. p. 35 — Preprint draft of Chapter 2 of Sun-Ah Jun (ed.) (in press) Prosodic models and transcription: Towards prosodic typology. Oxford University Press. Please do not cite without permission of authors and editor. Figure 2.3. Audio waveform, F0 contour, and MAE_ToBI xlabel windows for utterance Uhh... Quincy. Could I have the number to uh ... Shore Cab? p. 36 — Preprint draft of Chapter 2 of Sun-Ah Jun (ed.) (in press) Prosodic models and transcription: Towards prosodic typology. Oxford University Press. Please do not cite without permission of authors and editor. Notes 1 In accounts by British language teachers and phoneticians before the 1980s, the ‘nucleus’ of an intonation contour was modeled as a holistic dynamic tonal event governing the part of the contour beginning at the most stressed syllable. When this nucleus occurs far from the end of the contour, then, the pitch pattern on material after the nuclear stress is called the ‘tail’. The general shape of the intonation contour over accented syllables before the nucleus is then the ‘head’. 2 Note that there are only four basic break index values, ordered from 0 to 4, with a “hole” at 2. In the original Price et al. (1991) use of break indices, the value 2 represented a perceived boundary strength intermediate between a normal word boundary and a larger phrase boundary, and was used to mark a number of imprecisely-defined phenomena. The ToBI system restricts the use of this label to an explicit subset of these phenomena — namely, inter-word junctures where there is ambiguity between a 1 and a 3 either because there is a phrase tone without the duration lengthening appropriate to a 3, or a lengthening appropriate to a 3 but no phrase tone. This means that ToBI labels do not recognise a prosodic constituent comparable to Selkirk’s (1995) “Minor Phrase” unless this is equated with Beckman and Pierrehumbert’s (1986) tonally marked “intermediate phrase”. Labellers who postulate and perceive a constituent boundary that is larger than a “Prosodic Word” but smaller than the lowest intonationally marked constituent are encouraged to mark these events in a comments tier (see Section 2.5). 3 The break index value ‘0’ was intended to mark a boundary between two orthographic words which is perceived to be considerably reduced in strength from a “normal” word boundary. The MAE_ToBI conventions suggest that this sense of close grouping should be associated with such segmental sandhi phenomena as the flapping of final /t/ in utterances such as Got a dime?, the palatalisation of final /t/ in We sent you the cheque., and so on — i.e., phenomena that have been cited by phonologists as evidence of multi-word prosodic constituents such as the “Prosodic Word” or a “Clitic Group” (see Hayes 1989, Selkirk 1995, Peperkamp 1999, and the references they cite for discussion of different theoretical views of these constituents). A break index value of ‘1’ is then a “normal” word boundary. A more precise definition of these levels is desirable, but not yet feasible, because corpus research on such phenomena as flapping and palatalisation lags considerably behind research on the phonetic correlates of prosodic grouping at the intermediate phrase and intonational phrase level. 4 This meant omitting break indices 5 and 6 from the Price et al. (1991) model, since these two break index values could not be identified with a categorically marked level of prosodic structure such as the intonational phrase. Rather, they were intended to encode the percept of (possibly recursive) higher-level groupings above the intonational phrase. 5 EMU is a set of tools for creating and analyzing speech databases. It includes a powerful search engine that can find segments and events based on their sequential and hierarchical contexts. For example, if a MAE spoken language database has associated word labels, and if those labels are hierarchically organised into intermediate phrases and intonation phrases, with associated MAE_ToBI labels, it is straightforward to query for every instance in the database of a word with an associated L+H* pitch accent that is also the last accent in its intermediate phrase and followed by a !Hphrase accent. The EMU readable version of the Guidelines to ToBI Labelling is available at http://www.shlrc.mq.edu.au/emu/emu-tobi.shtml. p. 37 — Preprint draft of Chapter 2 of Sun-Ah Jun (ed.) (in press) Prosodic models and transcription: Towards prosodic typology. Oxford University Press. Please do not cite without permission of authors and editor. 6 We note that no site seems to have rigorously adopted the practice envisioned by the original ToBI group of marking silences automatically, on the Misc tier. 7 The intermediate phrase in Greek, like the intermediate phrase in English, is defined by the presence of a phrase accent after the nuclear pitch accent (see Grice, Ladd, & Arvaniti, 1999, for discussion of the cross-linguistic applicability of this concept). Thus, the use of 2 as a marker of two types of tones-breaks mismatch in English has resulted in different numbers correspond to levels that are defined in the same way in the two languages. 8 See http://www.georgetown.edu/luperfoy/Discourse-Treebank/dri-home.html. 9 See http://www.ling.ohio-state.edu/~tobi/ame_tobi/annotation_conventions.html for this utterance. Hirst (1999: 73) reports that the sliding head “has been described as typical of Scottish accents” and suggests that it “is probably gaining ground throughout England possibly due to the influence of American speech where the pattern is very common”. Our impression is that it is more characteristic of Australian and New Zealand varieties, particularly those with a strong Scottish English substrate, than it is of mainstream American varieties — see, e.g., Fletcher and Harrington 1996; Ainsworth 2000.

403 citations


Journal ArticleDOI
TL;DR: An asymmetry in the interplay between syntax and semantics during on-line sentence comprehension is revealed, suggesting that semantic integration is influenced by syntactic processing.
Abstract: This study investigated the effects of combined semantic and syntactic violations in relation to the effects of single semantic and single syntactic violations on language-related event-related brain potential (ERP) effects (N400 and P600/SPS). Syntactic violations consisted of a mismatch in grammatical gender or number features of the definite article and the noun in sentence-internal or sentence-final noun phrases (NPs). Semantic violations consisted of semantically implausible adjective–noun combinations in the same NPs. Combined syntactic and semantic violations were a summation of these two respective violation types. ERPs were recorded while subjects read the sentences with the different types of violations and the correct control sentences. ERP effects were computed relative to ERPs elicited by the sentence-internal or sentence-final nouns. The size of the N400 effect to the semantic violation was increased by an additional syntactic violation (the syntactic boost). In contrast, the size of the P600/SPS to the syntactic violation was not affected by an additional semantic violation. This suggests that in the absence of syntactic ambiguity, the assignment of syntactic structure is independent of semantic context. However, semantic integration is influenced by syntactic processing. In the sentence-final position, additional global processing consequences were obtained as a result of earlier violations in the sentence. The resulting increase in the N400 amplitude to sentence-final words was independent of the nature of the violation. A speeded anomaly detection task revealed that it takes substantially longer to detect semantic than syntactic anomalies. These results are discussed in relation to the latency and processing characteristics of the N400 and P600/SPS effects. Overall, the results reveal an asymmetry in the interplay between syntax and semantics during on-line sentence comprehension.

395 citations


Journal Article
LI Sheng-mei1
TL;DR: This sentence pattern typically shows the features of proverbs like "秀才秂才,错字布袋" in language structure, semantic meaning and pragmatic function.
Abstract: Sentence patterns like "秀才秀才,错字布袋"are unique in the grammatical structure, semantic structure and pragmatic function. The typical feature of this pattern is that the same word or phrase reappears continually at the very beginning. It has two parts: (1) The proceeding part("秀才秀才") includes a word and its repeated form, which is different from the reduplication in grammar and the continual repetition in rhetoric. This part can have referential functions in particular situations;and (2) The main function of the last part ("错字布袋")is to interpret the proceeding one. It is the semantic focus of the whole sentence. This sentence pattern typically shows the features of proverbs like "秀才秀才,错字布袋"in language structure,semantic meaning and pragmatic function.

367 citations


Journal ArticleDOI
TL;DR: In this article, a referential communication task was used to determine the conditions under which speakers produce and listeners use prosodic cues to distinguish alternative meanings of a syntactically ambiguous phrase.

351 citations


Journal ArticleDOI
TL;DR: This work examines the relation between phrasal structure and the control and coordination of articulation within a dynamical systems model of speech production, and reviews how speakers modulate the spatiotemporal organization of articulatorygestures as a function of their phrasality.

306 citations


Patent
22 Aug 2003
TL;DR: In many cases, factor matrix analysis is also advantageously applied to select high technical content phrases to be analyzed for possible inclusion within a new query as discussed by the authors, which can be used to determine levels of emphasis within a collection of data, determine the desirability of conflating search terms, detect symmetry or asymmetry between two text elements within two documents, generate a taxonomy of documents within the collection, and perform literature-based problem solving.
Abstract: Text searching is achieved by techniques including phrase frequency analysis and phrase-co-occurrence analysis. In many cases, factor matrix analysis is also advantageously applied to select high technical content phrases to be analyzed for possible inclusion within a new query. The described techniques may be used to retrieve data, determine levels of emphasis within a collection of data, determine the desirability of conflating search terms, detect symmetry or asymmetry between two text elements within a collection of documents, generate a taxonomy of documents within a collection, and perform literature-based problem solving. (This abstract is intended only to aid those searching patents, and is not intended to limit the disclosure of claims in any manner.)

Book ChapterDOI
01 Jan 2003
TL;DR: This chapter describes the applicability of the eye-tracking method in studying global text processing, which is more complex and varied than the mental processing associated with lexical processing.
Abstract: Publisher Summary This chapter describes the applicability of the eye-tracking method in studying global text processing. Eye tracking is used to study basic reading processes and syntactic parsing, but there are few studies where eye tracking is employed to examine global text processing. As one moves from the study of lexical processing to syntactic processing, the potential units of analysis increase in both number and size. There are four relevant levels of processing in the study of syntactic processing: (1) the word at which a parsing choice is expected to be made or a syntactic ambiguity to reveal itself, (2) the phrase, (3) the clause, and (4) the whole sentence. Related to the increase in the number and size of potentially interesting units of analysis, the mental processing associated with syntactic processes is more complex and varied than the mental processing associated with lexical processing. Thus, syntactic effects on eye movements are correspondingly more complex than lexical effects on eye movements.

Journal ArticleDOI
TL;DR: Many data discussed in this paper indicate that there is no evidence of (covert) tenses in Chinese, therefore, challenging work remains for those who have claimed that Tense Phrase is projected in Chinese phrase structures.
Abstract: This paper discusses how Chinese, a so-called tenseless language, determines its temporal reference For simplex sentences without time adverb or aspectual marker, I show that temporal reference is correlated with aktionsart or grammatical viewpoint For sentences with an aspectual marker, I discuss the temporal semantics of le and guo in detail, showing how their tense/aspectual meanings contribute to temporal reference I propose to analyze le as an event realization operator and guo as an anteriority operator For subordinate clauses, I show that temporal reference of complement clauses of verbs is basically determined by verbal semantics of individual verbs, which may impose some temporal restriction on the temporal location of the embedded event As for relative clauses and temporal adverbial clauses, many different factors such as lexical verbal semantics, referential properties of determiners, lifetime effect of noun phrases, semantic or pragmatics constraints on temporal connectives, inference rules and world knowledge, etc, all interact to help determine temporal reference Many data discussed in this paper indicate that there is no evidence of (covert) tenses in Chinese Therefore, challenging work remains for those who have claimed that Tense Phrase is projected in Chinese phrase structures

Proceedings ArticleDOI
28 Jul 2003
TL;DR: This paper investigates the use of concept-based document representations to supplement word- or phrase-based features, and proposes to use AdaBoost to optimally combine weak hypotheses based on both types of features.
Abstract: Term-based representations of documents have found wide-spread use in information retrieval. However, one of the main shortcomings of such methods is that they largely disregard lexical semantics and, as a consequence, are not sufficiently robust with respect to variations in word usage.In this paper we investigate the use of concept-based document representations to supplement word- or phrase-based features. The utilized concepts are automatically extracted from documents via probabilistic latent semantic analysis. We propose to use AdaBoost to optimally combine weak hypotheses based on both types of features. Experimental results on standard benchmarks confirm the validity of our approach, showing that AdaBoost achieves consistent improvements by including additional semantic features in the learned ensemble.

Journal ArticleDOI
TL;DR: This paper investigated grammatical feature selection during noun phrase production in German and Dutch, and found that the gender congruency effect is really a determiner congruence effect and that grammatical features are an automatic consequence of lexical node selection.

Journal ArticleDOI
TL;DR: The authors examined the effects of syntactic (tense) violations occurring on regularly versus irregularly inflected verbs using event-related brain potentials (ERPs) and found a reliable N400 effect for verb frequency and a reliable P600 effect for grammaticality.
Abstract: We examined the effects of syntactic (tense) violations occurring on regularly versus irregularly inflected verbs using event-related brain potentials (ERPs). Participants read sentences in which the main verb varied in terms of regularity (regular vs. irregular), frequency (high vs. low), and grammaticality (tense violation vs. no tense violation). For regular verbs, we found a reliable N400 effect for verb frequency and a reliable P600 effect for grammaticality, with no interaction between lexical frequency and grammaticality. For irregular verbs, we found interactions between lexical frequency and grammaticality, with tense violations on high-frequency forms (*will stood) eliciting a much earlier P600 response than tense violations on low-frequency forms (*will knelt). We discuss the implications of these results with respect to morphological parsing, the time course of syntactic feature analysis, and their consequent effects on temporal properties of ERP components.

Journal ArticleDOI
TL;DR: In this article, the authors define mission statements as "enduring statements of purpose" that distinguish one organization from other similar enterprises, and suggest that a well-crafted mission statement can provide advantages or benefits to a company.
Abstract: Analyses mission statements and defines them as “enduring statements of purpose” that distinguish one organization from other similar enterprises. Suggest that a well‐crafted mission statement can provide advantages or benefits to a company. States mission statements need to be longer than a phrase or sentence, but not a two‐page document, and not overly specific with regard to values, percentages, numbers, goals, or strategies. Concludes that better mission statements will give rewarding payoffs, meaning enhanced personal and business performance.

Book
01 Jan 2003
TL;DR: This tutorial discusses how meaning distinction is shown in phrases and discusses grammar and lexis in relation to regularity and variation.
Abstract: "Preface Acknowledgements" LEVEL 1: TASK 1: HOW MEANINGS ARE SHOWN Theme: meaning distinction Word/phrase: block TASK 2: UNDERLYING REGULARITY Theme: regularity and variation Word/phrase: gamut TASK 3: WORDS AS LIABILITIES Theme: semantic prosody Word/phrase: regime TASK 4: LITERAL AND METAPHORICAL Theme: meaning in phrases Word/phrase: free hand TASK 5: MEANING FOCUS Theme: co-selection Word/phrase: physical LEVEL 2: TASK 6: SPECIALISED MEANING Theme: lexical item Word/phrase: brook TASK 7: SUBTLE DISTINCTIONS Theme: meaning in phrases Word/phrase: best thing TASK 8: MEANING FLAVOUR Theme: co-selection Word/phrase: incur TASK 9: EXTENSIONS OF GRAMMER Theme: grammar and lexis Word/phrase: borders on TASK 10: MEANING AND CONTEXT Theme: meaning in proximity Word/phrase: lap LEVEL 3: TASK 11: WORDS DIFFICULT TO DEFINE Theme: lexical item Word/phrase: budge TASK 12: AD HOC MEANING Theme: meaning in proximity Word/phrase: veritable TASK 13: GRAMMATICAL FRAMES Theme: regularity and variation Word/phrase: about as TASK 14: HIDDEN MEANINGS Theme: semantic prosody Word/phrase: happen LEVEL 4: TASK 15: CLOSELY RELATED MEANINGS Theme: meaning distinction Word/phrase: manage TASK 16: ONE AND ONE IS NOT EXACTLY TWO Theme: lexical item Word/phrase: true feelings TASK 17: COMMON WORDS Theme: meaning in phrases Word/phrase: place TASK 18: SINGLUAR AND PLURAL Theme: grammar and lexis Word/phrase: eye/clock "Glossary"

Journal ArticleDOI
TL;DR: Results suggest that both French adults and 13-month-old American infants perceive phonological phrase boundaries as natural word boundaries, and that they do not attempt lexical access on pairs of syllables which span such a boundary.

Patent
13 Feb 2003
TL;DR: A speaker identity claim (SIC) utterance is received and recognized as mentioned in this paper, and a first dynamic phrase (FDP) is generated, and a user is prompted to speak same.
Abstract: A speaker identity claim (SIC) utterance is received and recognized. The SIC utterance is compared with a voice profile registered under the SIC, and a first verification decision is based thereon. A first dynamic phrase (FDP) is generated, and a user is prompted to speak same. An FDP utterance is received, and compared with the voice profile registered under the SIC to make a second verification decision. If the second verification decision indicates a high or low confidence level, the speaker identity claim is accepted or rejected, respectively. If the verification decision indicates a medium confidence level, a second dynamic phrase (SDP) is generated, and the user is prompted to speak same. An SDP utterance is received, and compared with the voice profile registered under the SIC to make a third verification decision. The speaker identity claim is accepted or rejected based on the third verification decision.

Journal ArticleDOI
TL;DR: The authors investigated the effects of first language word-level reading skills on the development of English as a second language (ESL) word level reading skills and found that Japanese ESL learners had significantly faster and more accurate word recognition skills compared to a proficiency-matched Arab ESL group.
Abstract: This study investigated the effects of first language word-level reading skills on the development of English as a second language (ESL) word-level reading skills. A crosslinguistic analysis indicates that native Arabic and Japanese speakers are likely to encounter different types of ESL word-level reading difficulties. Specifically, native Arab speakers are likely to exhibit difficulties with prelexical ESL word recognition processes, whereas native Japanese speakers are likely to exhibit difficulties with on-line ESL word integration processes that integrate words into phrase/clause structures for comprehension. Results from a lexical decision task showed that a group of Japanese ESL learners had significantly faster and more accurate word recognition skills compared to a proficiency-matched Arab ESL group. In contrast, both groups read words within sentences in a sentence reading task at the same speed, though the Arab ESL group was significantly more accurate in integrating words into larger phrase and clause units and comprehending them than the Japanese ESL group. These results indicate that Arab and Japanese ESL students have different word-level reading difficulties, implicating different learning needs and pedagogical interventions for developing ESL reading proficiency.

Journal ArticleDOI
TL;DR: A word detection technique is used that minimizes task demands in order to evaluate attentional and processing speed resources during the comprehension of simple sentences without subordinate clauses and sentences containing subject-relative and object-relative center-embedded subordinate clauses to find that PD patients have poor sensitivity to phonetic errors embedded in unbound grammatical morphemes, regardless of the clausal structure of the sentence.

Journal ArticleDOI
TL;DR: Investigating the parser's responses to a syntactic violation and a semantic violation found that the parser constructs distinct syntactic and semantic analyses of a sentence and that this characteristic holds cross-linguistically.

Journal ArticleDOI
TL;DR: This paper examined how the presence of disfluencies in a spoken sentence might affect processes that assign syntactic structure (i.e., parsing) and found that disfluency can influence the parser by signaling a particular structure; at the same time, for the parser, a disluency might be any interruption to the flow of speech.

Proceedings ArticleDOI
12 Apr 2003
TL;DR: The clue alignment approach, which is proposed in this paper, makes it possible to combine association clues taking different kinds of linguistic information into account and allows a dynamic tokenization into token units of varying size.
Abstract: In this paper, a word alignment approach is presented which is based on a combination of clues. Word alignment clues indicate associations between words and phrases. They can be based on features such as frequency, part-of-speech, phrase type, and the actual wordform strings. Clues can be found by calculating similarity measures or learned from word aligned data. The clue alignment approach, which is proposed in this paper, makes it possible to combine association clues taking different kinds of linguistic information into account. It allows a dynamic tokenization into token units of varying size. The approach has been applied to an English/Swedish parallel text with promising results.

01 Jan 2003
TL;DR: The purpose of the work is to find out how to process the information that all members of a set enjoy a property expressed by an adjective and if it works, ‘membered’ will become a reserved word and the work with it will be automated.
Abstract: The information that all members of a set enjoy a property expressed by an adjective can be processed in a systematic way. The purpose of the work is to find out how to do that. If it works, ‘membered’ will become a reserved word and the work with it will be automated. I have chosen membered rather than inhabited because of the compatibility with the Automath terminology. The phrase τ inhabits θ could be translated to τ is θ in Mizar.

Journal ArticleDOI
TL;DR: It is argued that this type of allomorphy is not conditioned by a syntactic adjacency condition, and is found when the head and phrase in question are contained in the same prosodic phrase at the interface that connects syntax and phonology (PF).
Abstract: This paper deals with a class of morphological alternations that seem to involve syntactic adjacency. More specifically, it deals with alternative realizations of syntactic terminals that occur when a particular phrase immediately follows a particular head. We argue that this type of allomorphy is not conditioned by a syntactic adjacency condition. Instead, it is found when the head and phrase in question are contained in the same prosodic phrase at the interface that connects syntax and phonology (PF). We illustrate our approach with six case studies, concerning agreement weakening in Dutch and Arabic, pronoun weakening in Middle Dutch and Celtic, and pro-drop in Old French and Arabic.

Journal ArticleDOI
Mattias Heldner1
TL;DR: In this paper, the authors show that increases in overall intensity and spectral emphasis are reliable acoustic correlates of focal accents in Swedish, and that spectral emphasis is more reliable than overall intensity.

Book ChapterDOI
22 Sep 2003
TL;DR: This work presents a model of natural language generation from semantics using the FrameNet semantic role and frame ontology, and is able to identify null-instantiated roles, which commonly occur in the corpus and whose identification is crucial to natural language interpretation.
Abstract: Determining the semantic role of sentence constituents is a key task in determining sentence meanings lying behind a veneer of variant syntactic expression. We present a model of natural language generation from semantics using the FrameNet semantic role and frame ontology. We train the model using the FrameNet corpus and apply it to the task of automatic semantic role and frame identification, producing results competitive with previous work (about 70% role labeling accuracy). Unlike previous models used for this task, our model does not assume that the frame of a sentence is known, and is able to identify null-instantiated roles, which commonly occur in our corpus and whose identification is crucial to natural language interpretation.

Journal ArticleDOI
TL;DR: The authors reported two experiments testing their hypotheses that the direction of a phrase can be predicted from properties of its membership function, and this relation is invariant across contexts, and they found only limited support for hypothesis (c) regarding the effects of modifiers on directionality.
Abstract: Teigen and Brun have suggested that distinct from their numerical implications, most probability phrases are either positive or negative, in that they encourage one to think of reasons why the target event will or will not occur. We report two experiments testing our hypotheses that (a) the direction of a phrase can be predicted from properties of its membership function, and (b) this relation is invariant across contexts, and (c) —originally formulated by Teigen and Brun (1999)—that strong modifiers intensify phrase directionality. For each phrase, participants encoded membership functions by judging the degree to which it described the numerical probabilities 0.0, 0.1, …, 1.0, and also completed sentences including the target phrase. The types of reasons given in the sentence completion task were used to determine the phrase's directionality. The results support our hypotheses (a) and (b) regarding the relation between directionality and the membership functions, but we found only limited support for hypothesis (c) regarding the effects of modifiers on directionality. A secondary goal, to validate an efficient method of encoding membership functions, was also achieved. Copyright © 2003 John Wiley & Sons, Ltd.