scispace - formally typeset
Search or ask a question

Showing papers on "Phrase published in 1995"


Journal ArticleDOI
TL;DR: This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text, and presents a terminology indentification algorithm that is motivated by these linguistic properties.
Abstract: This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase.The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the domain of the text to which it is applied.

794 citations


Book ChapterDOI
01 Aug 1995
TL;DR: An account is needed of the grammatical notions relevant to code-switching that can be used both to characterise specific instances of intra-sentential switching and to relate the various proposals in the literature to each other.
Abstract: This chapter discusses some of the descriptive tools that can be used for the analysis. It illustrates the issues involved in trying to unify the grammatical constraints on borrowing with those on code-mixing, in terms of the notion of local coherence imposed by language indices. A discourse-oriented way of determining the base-language is: the language of the conversation. In a structurally oriented model, some element or set of elements determines the base-language: often the main verb, which is the semantic kernel of the sentence, assigning the different semantic roles and determining the state or event expressed by the clause, is taken to determine the base-language. A very complicated issue concerns the relation between qualitative structural and quantitative distributional analysis. Code-switching is the use of two languages in one clause or utterance. As such code-switching is different from lexical borrowing, which involves the incorporation of lexical elements from one language in the lexicon of another language.

229 citations


Book ChapterDOI
TL;DR: The thesis presented is that muscular tension and motion can be used to phrase human-computer dialogues to reinforce the chunking of low-level tasks that correspond to the higher-level primitives of the mental model that is being established.
Abstract: The use of physical gestures to reinforce cognitive chunking is discussed The thesis presented is that muscular tension and motion can be used to phrase human-computer dialogues These phrases can be used to reinforce the chunking of low-level tasks that correspond to the higher-level primitives of the mental model that we are trying to establish The relationship of such gestures to the issue of compatibility is also discussed Finally, we suggest how to improve the use of grammar-based models in analysing and designing interaction languages

214 citations


Journal ArticleDOI
TL;DR: The authors provided support for the claim that there are two functional projections in full noun phrases, determiner phrase (DP) and number phrase (NumP), based on an analysis of the dual marker in Modern Hebrew.
Abstract: This paper provides support for the claim that there are two functional projections in full noun phrases, Determiner Phrase (DP) and Number Phrase (NumP), based on an analysis of the dual marker in Modern Hebrew. The assumption of two nominal functional categories permits a structural account of differences in the distribution of elements that function as first/second person pronouns and those that function as third person pronouns. It is hypothesized that 1st/2nd person pronouns are DPs which contain only the head D and that this head is specified for person, number and gender. In contrast, 3rd person pronouns have a more complex structure, where D is specified for person and Num is specified for number and gender. Similarities between past tense agreement and 1st/2nd person pronouns on the one hand and between present tense agreement and 3rd person pronouns on the other suggest that the same nominal functional categories that act as pronouns also act as agreement. In other words, the difference between pronouns and agreement lies not in their category, but in their role in the syntax. Finally, this view of pronouns and agreement is applied to complex null subject phenomena in Modern Hebrew. In order to account for the fact that the distribution of null subjects varies across persons and across tenses, we propose a matching condition on both the category and content of the null pronoun and agreement.

213 citations


Patent
24 Jan 1995
TL;DR: In this article, a phrasebook approach for translating phrases from a first language into a second language comprises a store holding a collection of phrases in the second language, and the output may be in text, or, using speech synthesis, in voiced form.
Abstract: A language translation system for translating phrases from a first language into a second language comprises a store holding a collection of phrases in the second language. Phrases input in the first language are each characterized on the basis of one or more keywords, and the corresponding phrase in the second language is output. Such a phrasebook approach enables what is effectively rapid and accurate translation, even from speech. Since the phrases in the second language are prepared in advance and held in store, there need be no problems of poor translation or ungrammatical construction. The output may be in text, or, using speech synthesis, in voiced form. With appropriate choice of keywords it is possible to characterize a large number of relatively long and complex phrases with just a few keywords.

141 citations


Proceedings Article
20 Aug 1995
TL;DR: This paper proposed a dependency-based method for evaluating broad-coverage parsers, which offers several advantages over previous methods that are based on phrase boundaries The error count score is not only more intuitively meaningful than other scores, but also more relevant to semantic interpretation.
Abstract: With the emergence of broad-coverage parsers, quantitative evaluation of parsers becomes increasingly more important. We propose a dependency-based method for evaluating broad-coverage parsers. The method offers several advantages over previous methods that are based on phrase boundaries The error count score. We propose here is not only more intuitively meaningful than other scores, but also more relevant to semantic interpretation. We will also present an algorithm for transforming constituency trees into dependency trees so that the evaluation method is applicable to both dependency and constituency grammars. Finally, we discuss a set of operations for modifying dependency trees that can be used lo eliminate inconsequential differences among different parse trees and allow us to selectively evaluate different aspects of a parser.

135 citations


Journal ArticleDOI
TL;DR: This article found that sentence context has a dramatic effect on single-word processing, and that high and low-frequency words elicit different ERPs at the beginning of sentences but this effect is suppressed by a meaningful sentence context.
Abstract: Interactions between sentences and the individual words that comprise them are reviewed in studies using the event-related brain potential (ERP) Results suggest that, for ambiguous words preceded by a biasing sentence context, context is used at an early stage to constrain the relevant sense of a word rather than select among multiple active senses A study comparing associative single-word context and sentence-level context also suggests that sentence context influences the earliest stage of semantic analysis, but that the ability to use sentence context effectively is more demanding of working memory than the ability to use single-word contexts Another indication that sentence context has a dramatic effect on single-word processing was the observation that high- and low-frequency words elicit different ERPs at the beginnings of sentences but that this effect is suppressed by a meaningful sentence context

131 citations


Patent
Ronald M. Kaplan1, Atty T. Mullins1
05 May 1995
TL;DR: In this paper, a computerized multilingual translation dictionary includes a set of word and phrases for each of the languages it contains, plus a mapping that indicates for each word or phrase in one language what the corresponding translations in the other languages are.
Abstract: A computerized multilingual translation dictionary includes a set of word and phrases for each of the languages it contains, plus a mapping that indicates for each word or phrase in one language what the corresponding translations in the other languages are. The set of words and phrases for each language are divided up among corresponding concept groups based on an abstract pivot language. The words and phrases are encoded as token numbers assigned by a word-number mapper laid out in sequence that can be searched fairly rapidly with a simple linear scan. The complex associations of words and phrases to particular pivot language senses are represented by including a list of pivot-language sense numbers with each word or phrase. The preferred coding of these sense numbers is by means of a bit vector for each word, where each bit corresponds to a particular pivot element in the abstract language, and the bit is ON if the given word is a translation of that pivot element. Then, to determine whether a word in language 1 translates to a word in language 2 only requires a bit-wise intersection of their associated bit-vectors. Each word or phrase is prefixed by its bit-vector token number, so the bit-vector tokens do double duty: they also act as separators between the tokens of one phrase and those of another. A pseudo-Huffman compression scheme is used to reduce the size of the token stream. Because of the frequency skew for the bit-vector tokens, this produces a very compact encoding.

126 citations


Patent
06 Mar 1995
TL;DR: This paper used a glossary to reflect common phrases and words pertaining to any specific subject matter, as a source for retrieving words and phrases from abbreviations, and used this glossary for very fast entry of text into a computer.
Abstract: A system for very fast entry of text into a computer. The system uses a current glossary, which may be custom generated to reflect common phrases and words pertaining to any specific subject matter, as a source for retrieving words and phrases from abbreviations. Text entry is input into the system through the entry of word abbreviations, phrase abbreviations and text entries. The system uses non-fixed abbreviations for words and phrases. A word abbreviation starts with the initial of the word and includes a subset of its other letters. A phrase abbreviation starts with the initials of its first word, or two words, and includes a subset of the initials of its other words. Words and phrases satisfying a specific abbreviation are displayed in advisory tables. A desired word or phrase may be selected using an expansion command. The expanded term is then entered into the system and thereby permits entry of lengthy phrases and words with minimal text entry. Finally, the systems proposes phrases that are likely continuations to the last entered words, allowing fast selection and input of such continuations.

121 citations


Book
03 May 1995
TL;DR: In this paper, an autosegmental analysis of Palermo Italian yes-no interrogatives is presented, where pitch configurations are expressed in terms of H(igh) and L(ow) tones which are either part of a pitch accent or at intermediate or intonation phrase boundaries.
Abstract: In Palermo Italian yes-no interrogatives, if the last syllable of a phrase is unstressed, the nuclear pitch contour is rising-falling, whereas if it is stressed, the contour is simply rising. An autosegmental analysis, where pitch configurations are expressed in terms of H(igh) and L(ow) tones which are either part of a pitch accent or at intermediate or intonation phrase boundaries, is shown to offer the flexibility necessary to account for such context-dependent variation. The interrogative marker consists of a L*+H pitch accent. There is no paradigmatic contrast on the intermediate phrase boundary tone (it is always L) which means that its function is purely delimitative. This tone is only fully realised when a postaccentual syllable is available to carry it; technically, it requires a secondary attachment to a syllable. The absence of the falling part of the L*+HL (L) configuration in phrases with no postaccentual syllable is thus explained.

112 citations


Journal ArticleDOI
TL;DR: The authors investigated the application of the late closure principle in Italian reading-time experiments and found that it applies to initial parsing in Italian without being affected by the thematic structure of the complex noun phrase.
Abstract: Four reading-time experiments investigated the application of the late closure principle in Italian. The experiments tested the principle governing the initial attachment of different types of modifiers (relative clause, adjectival phrase, and prepositional phrase) to a complex noun phrase. By manipulating the type of preposition within the complex noun phrase, the authors investigated the role of the thematic structure in initial and final parsing. The results show that the late closure principle applies to initial parsing in Italian without being affected by the thematic structure of the complex noun phrase. Final interpretation, however, shows an effect of pragmatic preference and an effect of thematic structure on syntactic revisions. The results are discussed in terms of a parsing model that adopts syntactic parsing strategies and makes modular use of linguistic information. The purpose of this research was to assess whether late closure, an assumed universal sentence parsing principle (Frazier, 1978), applies in Italian. In this article we study the attachment of different types of modifiers to complex noun phrases, drawing a distinction between initial and final interpretation and trying to identify what variables (syntactic, semantic, and pragmatic) affect what stage in the comprehension process. The results of four on-line experiments conducted in Italian are presented. Kimball (1973) and Frazier and Fodor (1978) proposed strategies that apply to the initial parsing of a sentence as soon as each word is perceived. Some examples of such strategies are right association (Kimball, 1973), minimal attachment and late closure (Frazier & Fodor, 1978), superstrategy (Fodor, 1979), recent filler strategy (Frazier, Clifton, & Randall, 1983), active filler strategy (Frazier, 1987), and the minimal chain principle (De Vincenzi, 1991). The basic idea in all of these strategies is that they are directly derived from a simple principle: Choose to do whatever costs the least effort in terms of computation to interpret the incoming linguistic input before it decays. This choice is motivated by a basic cognitive reason, namely, the restrictions on short-term memory in terms of memory and computational space and the fact that

Journal ArticleDOI
TL;DR: In this experiment, pianists memorized and performed polyphonic music in which the serial distance and phrase structure between the entrances of 2 musical voices were varied, suggesting structural as well as linear constraints on the planning of complex sequences.
Abstract: Two factors influence the range of planning in music performance: the structural content of musical events and the serial distances between them. In this experiment, pianists memorized and performed polyphonic music (which contained multiple simultaneous voices) in which the serial distance and phrase structure between the entrances of 2 musical voices were varied. The distance between each musical element and its influence on other elements was assessed in production errors and interonset timing measures. Errors and timing measures offered converging evidence for interactive effects of serial distance and phrase structure; intervening phrase boundaries reduced the serial distances over which musical elements influenced one another. These findings suggest structural as well as linear constraints on the planning of complex sequences.

Patent
Yen-Lu Chow1, Hsiao-Wuen Hon1
04 Oct 1995
TL;DR: In this article, a prefix/body/suffix language model for phrase-based search in a speech recognition system and an apparatus for constructing and/or searching through the language model is presented.
Abstract: A method of constructing a language model for a phrase-based search in a speech recognition system and an apparatus for constructing and/or searching through the language model. The method includes the step of separating a plurality of phrases into a plurality of words in a prefix word, body word, and suffix word structure. Each of the phrases has a body word and optionally a prefix word and a suffix word. The words are grouped into a plurality of prefix word classes, a plurality of body word classes, and a plurality of suffix word classes in accordance with a set of predetermined linguistic rules. Each of the respective prefix, body, and suffix word classes includes a number of prefix words of same category, a number of body words of same category, and a number of suffix words of same category, respectively. The prefix, body, and suffix word classes are then interconnected together according to the predetermined linguistic rules. A method of organizing a phrase search based on the above-described prefix/body/suffix language model is also described. The words in each of the prefix, body, and suffix classes are organized into a lexical tree structure. A phrase start lexical tree structure is then created for the words of all the prefix classes and the body classes having a word which can start one of the plurality of phrases while still maintaining connections of these prefix and body classes within the language model.

Book
01 Jun 1995
TL;DR: This paper proposed a theory of phrase structure in which structures are built by a simple adjunction operation, and specifiers are solely characterised by agreement, and a new system of projection called the relativized X-bar theory is introduced.
Abstract: The book proposes a theory of phrase structure in which structures are built by a simple adjunction operation, and specifiers are solely characterised by agreement. Having introduced some of the basic notions of the principle-and-parameters theory in Chapter 1, Chapter 2 discusses and illustrates the fundamental difference between lexical and functional categories: Lexical categories have Lexical Conceptual Structure in the sense of Hale and Keyser (1986), whereas functional categories lack such intrinsic semantic property. Instead, functional categories possess agreement features which connect two distinct syntactic categories. Based on this fundamental difference, a new system of projection called the relativized X-bar theory is introduced. Chapter 3 explores various consequences of the projection system introduced in Chapter 2. In Chapter 4, the discussion focuses on the phrase structural properties of Japanese.

Journal ArticleDOI
TL;DR: The restricted semantic ellipsis hypothesis is committed to an enormous number of multiply ambiguous expressions, the introduction of which gains us no extra explanatory power and the doctrine must be rejected.
Abstract: The restricted semantic ellipsis hypothesis, we have argued, is committed to an enormous number of multiply ambiguous expressions, the introduction of which gains us no extra explanatory power. We should, therefore, reject it. We should also spurn the original version since: (a) it entails the restricted version and (b) it incorrectly declares that, whenever a speaker makes an assertion by uttering an unembedded word or phrase, the expression uttered has illocutionary force. Once rejected, the semantic ellipsis hypothesis cannot account for the many exceptions to the syntactic ellipsis hypothesis. So, we can safely infer that the Claim is true. (1)The Claim: Speakers can make assertions by uttering ordinary, unembedded, words and phrases. To the degree that the Claim reallyis in tension with the primacy of sentences (i.e., the view that (a) only sentences can be used to make assertions and (b) only sentences are meaningful in isolation) this doctrine must also be rejected.

Journal ArticleDOI
TL;DR: The size of a randomly selected phrase, and the average number of phrases of a given size (the so-called average profile of phrase sizes) are focused on.
Abstract: Consider the parsing algorithm developed by Lempel and Ziv (1978) that partitions a sequence of length n into variable phrases (blocks) such that a new block is the shortest substring not seen in the past as a phrase. In practice, the following parameters are of interest: number of phrases, the size of a phrase, the number of phrases of given size, and so forth. In this paper, we focus on the size of a randomly selected phrase, and the average number of phrases of a given size (the so-called average profile of phrase sizes). These parameters can be efficiently analyzed through a digital search tree representation. For a memoryless source with unequal probabilities of symbols generation (the so-called asymmetric Bernoulli model), we prove that the size of a typical phrase is asymptotically normally distributed with mean and variance explicitly computed. In terms of digital search trees, we prove the normal limiting distribution of the typical depth (i.e., the length of a path from the root to a randomly selected node). The latter finding is proved by a technique that belongs to the toolkit of the "analytical analysis of algorithms", and it seems to be novel in the context of data compression. >

Journal ArticleDOI
TL;DR: Analysis of emergent sentence production patterns in a subset of Broca's subjects who evinced sentence production (and comprehension) deficits involving "complex" sentences in which noun phrases have been moved out of their canonical positions indicated not only improved sentence production abilities in all subjects under study, but also--in many cases--generalization of sentence production across linguistic lines.

Journal ArticleDOI
TL;DR: In this article, three strategies for translating a metaphor to a new context are discussed in semantic, pragmatic and communicative terms, and discussed in a case study for the problem of the interaction of participants in the communicative act.

Proceedings ArticleDOI
09 May 1995
TL;DR: It is shown that the predecessor-word identity provided by a first bigram decoding might be used to constrain the word graph without impairing the next pass, and tested for the North American Business corpus used in the November '94 evaluation.
Abstract: We address the problem of using word graphs (or lattices) for the integration of complex knowledge sources like long span language models or acoustic cross-word models, in large vocabulary continuous speech recognition. A method for efficiently constructing a word graph is reviewed and two ways of exploiting it are presented. By assuming the word pair approximation, a phrase level search is possible while in the other case a general graph decoder is set up. We show that the predecessor-word identity provided by a first bigram decoding might be used to constrain the word graph without impairing the next pass. This procedure has been applied to 64 k-word trigram decoding in conjunction with an incremental unsupervised speaker adaptation scheme. Experimental results are given for the North American Business corpus used in the November '94 evaluation.

Patent
Allen Louis Gorin1
15 Sep 1995
TL;DR: In this paper, a set of meaningful phrases are determined by a grammatical inference algorithm which operates on a predetermined corpus of speech utterances, each such utterance being associated with a specific task objective, and wherein each utterance is marked with its associated task objective.
Abstract: A methodology for automated task selection is provided, where the selected task is identified in natural speech of a user making such a selection. A set of meaningful phrases are determined by a grammatical inference algorithm which operates on a predetermined corpus of speech utterances, each such utterance being associated with a specific task objective, and wherein each utterance is marked with its associated task objective. Each meaningful phrase developed by the grammatical inference algorithm can be characterized as having both a Mutual Information value and a Salience value (relative to an associated task objective) above a predetermined threshold.

Journal ArticleDOI
TL;DR: ACTS is an automatic Chinese text segmentation proto-type for Chinese full text retrieval that applies partial syntactic analysis—the analysis of morphemes, words, and phrases.
Abstract: Text segmentation is a prerequisite for text retrieval systems. Chinese texts cannot be readily segmented into words because they do not contain word boundaries. ACTS is an automatic Chinese text segmentation proto-type for Chinese full text retrieval. It applies partial syntactic analysis—the analysis of morphemes, words, and phrases. The idea was originally largely inspired by experiments on English morpheme and phrase-analysis-based text retrieval, which are particularly germane to Chinese, because neither Chinese nor English texts have morpheme and phrase boundaries. ACTS is built on the hypothesis that Chinese words and phrases exceeding two characters can be characterized by a grammar that describes the concatenation behavior of the morphological and syntactic categories of their formatives. This is examined through three procedures: (1) Segmentation—texts are divided into one and two character segments by matching against a dictionary; (2) Category disambiguation—the syntactic categories of segments are determined according to context; (3) Parsing—the segments are analyzed based on the grammar, and subsequently combined into compound and complex words for indexing and retrieval. The experimental results, based on a small sample of 30 texts, show that most significant words and phrases in these texts can be extracted with a high degree of accuracy. © 1995 John Wiley & Sons, Inc.

Proceedings ArticleDOI
E.P. Giachin1
09 May 1995
TL;DR: Two procedures for automatically determining frequent phrases (within the framework of a probabilistic language model) in an unlabeled training set of written sentences are discussed and one procedure is optimal since it minimises the set perplexity.
Abstract: In some speech recognition tasks, such as man-machine dialogue systems, the spoken sentences include several recurrent phrases. A bigram language model does not adequately represent these phrases because it underestimates their probability. A better approach consists of modeling phrases as if they were individual dictionary elements. They we inserted as additional entries into the word lexicon, on which bigrams are finally computed. This paper discusses two procedures for automatically determining frequent phrases (within the framework of a probabilistic language model) in an unlabeled training set of written sentences. One procedure is optimal since it minimises the set perplexity. The other, based on information theoretic criteria, insures that the resulting model has a high statistical robustness. The two procedures are tested on a 762-word spontaneous speech recognition task. They give similar results and provide a moderate improvement over standard bigrams.


DOI
01 Jan 1995
TL;DR: The authors used Gaussian distribution classifiers to detect accents, phrase boundaries, and sentence modality in spontaneous speech, yielding recognition rates of 78 percent for accents, 80 percent for phrase boundaries and 85 percent for sentence modalities.
Abstract: In this paper detectors for accents, phrase boundaries, and sentence modality are described which derive prosodic features only from the speech signal and its fundamental frequency to support other modules of a speech understanding system in an early analysis stage, or in cases where no word hypotheses are available. A new method for interpolating and decomposing the fundamental frequency is suggested. The detectors' underlying Gaussian distribution classifiers were trained and tested with approximately 50 minutes of spontaneous speech, yielding recognition rates of 78 percent for accents, 80 percent for phrase boundaries, and 85 percent for sentence modality.

Journal ArticleDOI
TL;DR: The present paper treats discontinuity in this way, by residuation with respect to three adjunctions: + (associative), (.,.) (split-point marking), andW (wrapping), and it is shown how the resulting methods apply to discontinuous functors, quantifier scope and quantifiers scope ambiguity, pied piping, and subject and object antecedent reflexivisation.
Abstract: Discontinuity refers to the character of many natural language constructions wherein signs differ markedly in their prosodic and semantic forms. As such it presents interesting demands on monostratal computational formalisms which aspire to descriptive adequacy. Pied piping, in particular, is argued by Pollard (1988) to motivate phrase structure-style feature percolation. In the context of categorial grammar, Bach (1981, 1984), Moortgat (1988, 1990, 1991) and others have sought to provide categorial operators suited to discontinuity. These attempts encounter certain difficulties with respect to model theory and/or proof theory, difficulties which the current proposals are intended to resolve.

Journal ArticleDOI
TL;DR: This article considers a series of hypotheses and concludes that double-access readings involvede re attitude reports about state individuals and employs the techniques proposed by Cresswell and von Stechow (1982).
Abstract: This article deals with the semantics of “double-access” sentences. They are defined as English sentences which have a past tense morpheme in the matrix clause and a present tense morpheme in a subordinate clause in the immediate scope of the matrix past tense. They receive a very peculiar interpretation, which we will refer to as a “double-access interpretation.” The episode described in the embedded clause makes reference to two times: the time referred to by the matrix predicate and the utterance time of the whole sentence. Previous studies on this construction are largely descriptive and do not attempt to analyze it formally, with one important exception. Abusch (1991) addresses the problems connected with the construction and proposes that double-access interpretations involvede re attitudes about intervals. Her proposal contains an important insight and provides one possible account of the double-access construction. My proposal was independently developed at approximately the same time as Abusch's and offers an alternative explanation for the phenomena. I consider a series of hypotheses and conclude that double-access readings involvede re attitude reports about state individuals. This account is couched in an eventuality-based framework and employs the techniques proposed by Cresswell and von Stechow (1982). In order to yield the desired reading, the tense must first adjoin to the complement S, then to the matrix S, leaving two traces in the process.

Journal ArticleDOI
TL;DR: This paper will consider how knowledge of ontological domains and knowledge of lexical meaning work together in the interpretation of linguistic expressions and argues for an approach which takes into account a set of semantic coercion operations to meet sortal constraints.
Abstract: This paper is concerned with some aspects of the relationship between ontological knowledge and natural language understanding. More specifically, I will consider how knowledge of ontological domains and knowledge of lexical meaning work together in the interpretation of linguistic expressions. An essential assumption is that in accordance with ontological distinctions there are various semantic sorts which linguistic expressions can be divided. The specific purpose of the paper is to explore how under these conditions the intricate problem of systematic ambiguity can be dealt with. Here the term "systematic ambiguity" stands for the phenomenon that a word or a phrase has several possible meanings which systematically related to one another and from which a suitable meaning can be selected dependently on the linguistic and non-linguistic context of use. Taking into consideration that many predicative expressions impose on their arguments certain sortal selection restrictions. I will deal with the phenomenon that a word or a phrase being systematically ambiguous in some cases adapt itself to the semantic format of the expression it is combined with. Such an adaptation eliminating one or more possible meaning of the word or phrase is in fact a coercion of its semantic sort. I will argue for an approach which takes into account a set of semantic coercion operations to meet sortal constraints. Moreover, I will show how such sort coercions performed in language understanding are sanctioned by world knowledge.

Posted Content
TL;DR: The syntax of extraposition in the HPSG framework is investigated using English and German data, and an analysis using a nonlocal dependency and lexical rules is provided.
Abstract: This paper investigates the syntax of extraposition in the HPSG framework. We present English and German data (partly taken from corpora), and provide an analysis using lexical rules and a nonlocal dependency. The condition for binding this dependency is formulated relative to the antecedent of the extraposed phrase, which entails that no fixed site for extraposition exists. Our analysis accounts for the interaction of extraposition with fronting and coordination, and predicts constraints on multiple extraposition.


20 Nov 1995
TL;DR: This research creates decision tree models that predict the prosodic markers, and a dynamical system model that generates F0 and energy contours, a unique approach that incorporates traditional methods of F0 generation into a model whose parameters are estimated automatically from labeled speech.
Abstract: Higher quality speech synthesis is needed to make text-to-speech technology useful in more applications, and prosody--the suprasegmental aspects of speech that supply information about sentence meaning--is one of the aspects of synthesis technology most needing improvement. The goal here is to develop automatically trainable computational models for prosody that can be incorporated into existing text-to-speech synthesizers. This model is constructed in two modules: the first predicts abstract prosodic markers from text, and the second generates fundamental frequency (F0) and energy contours from the abstract markers and text. This research draws on recent developments in linguistic theory to provide the structure for the models, and on recent advances in statistical modeling to provide a formalism for automatically generating the model parameters. Because statistical models are automatically trained, they have advantages over rule-based models, particularly that they can be easily modified to different speaking styles via retraining on a different corpus. Specifically, this research creates decision tree models that predict the prosodic markers, and a dynamical system model that generates F0 and energy contours. Classification trees in conjunction with a Markov sequence assumption predict pitch accents and phrase tone types. Additionally, regression trees estimate F0 range and prominence levels. These trees use linguistically motivated features that are derived from text such as lexical stress and part-of-speech. The model for F0 and energy generation is a unique approach that incorporates traditional methods of F0 generation into a model whose parameters are estimated automatically from labeled speech. F0 and energy are generated with a state-space dynamical system model that assumes there is an unobserved state vector corresponding to the noisy observations of F0 and energy. Parameters are specified to capture segment, syllable, and phrase level effects. Since there is unobserved data, parameters are estimated using a non-traditional method based upon the EM algorithm. These two models are evaluated, independently and together, in quantitative and perceptual tests that demonstrate improvements in the quality of text-to-speech synthesis. These models are also demonstrated to be useful in prosody recognition applications.