scispace - formally typeset
Search or ask a question

Showing papers on "Phrase published in 1992"


Proceedings ArticleDOI
31 Mar 1992
TL;DR: An implementation of a part-of-speech tagger based on a hidden Markov model that enables robust and accurate tagging with few resource requirements and accuracy exceeds 96%.
Abstract: We present an implementation of a part-of-speech tagger based on a hidden Markov model. The methodology enables robust and accurate tagging with few resource requirements. Only a lexicon and some unlabeled training text are required. Accuracy exceeds 96%. We describe implementation strategies and optimizations which result in high-speed operation. Three applications for tagging are described: phrase recognition; word sense disambiguation; and grammatical function assignment.

737 citations


Proceedings ArticleDOI
01 Jun 1992
TL;DR: It is shown that optimal effectiveness occurs when using only a small proportion of the indexing terms available, and that effectiveness peaks at a higher feature set size and lower effectiveness level for a syntactic phrase indexing than for word-based indexing.
Abstract: Syntactic phrase indexing and term clustering have been widely explored as text representation techniques for text retrieval. In this paper we study the properties of phrasal and clustered indexing languages on a text categorization task, enabling us to study their properties in isolation from query interpretation issues. We show that optimal effectiveness occurs when using only a small proportion of the indexing terms available, and that effectiveness peaks at a higher feature set size and lower effectiveness level for a syntactic phrase indexing than for word-based indexing. We also present results suggesting that traditional term clustering method are unlikely to provide significantly improved text representations. An improved probabilistic text categorization method is also presented.

667 citations


Patent
12 Aug 1992
TL;DR: In this paper, an information retrieval system and method are provided in which an operator inputs one or more query words which are used to determine a search key for searching through a corpus of documents, and which returns any matches between the search key and the corpora of documents as a phrase containing the word data matching the query word(s), a non-stop (content) word next adjacent to the matching word data, and all intervening stop-words between the matching data and the next adjacent nonstop word.
Abstract: An information retrieval system and method are provided in which an operator inputs one or more query words which are used to determine a search key for searching through a corpus of documents, and which returns any matches between the search key and the corpus of documents as a phrase containing the word data matching the query word(s), a non-stop (content) word next adjacent to the matching word data, and all intervening stop-words between the matching word data and the next adjacent non-stop word. The operator, after reviewing one or more of the returned phrases can then use one or more of the next adjacent non-stop-words as new query words to reformulate the search key and perform a subsequent search through the document corpus. This process can be conducted iteratively, until the appropriate documents of interest are located. The additional non-stop-words from each phrase are preferably aligned with each other (e.g., by columnation) to ease viewing of the "new" content words.

497 citations


Journal ArticleDOI
01 Mar 1992-Language
TL;DR: In this paper, a phrase-structural analysis of topic and focus for three Mayan languages (Tzotzil, Jakaltek, Tz'utujil) is presented.
Abstract: Most Mayan languages are 'basically' predicate-initial, but various phrases occur before the predicate when they are focussed or topicalized. This paper assumes the framework of Chomsky 1986 and presents a phrase-structural analysis of topic and focus for three Mayan languages (Tzotzil, Jakaltek, Tz'utujil). Three distinct entities are distinguished: the focus and two types of topic, termed here 'internal' and 'external'. Each is argued to occupy a distinct structural position. At the heart of the analysis is an account of intonational phrasing and the distribution of several intonational phrase clitics in Tzotzil and Jakaltek. An algorithm is proposed for deriving intonational phrase structure from surface structure. Syntactic evidence further supports the phrase-structural differences established on prosodic grounds.

229 citations


Journal ArticleDOI
TL;DR: In this article, 17 naive untrained Ss were used to isolate 60 speech-related gestures and their lexical affiliates (i.e., the accompanying word or phrase judged as related in meaning) from these 221 narratives.
Abstract: Seventeen Ss were videotaped as they provided narrative descriptions of 13 photographs. Judgments from 129 naive untrained Ss were used to isolate 60 speech-related gestures and their lexical affiliates (i.e., the accompanying word or phrase judged as related in meaning) from these 221 narratives. A computer-video interface measured each gesture, and a 3rd group of Ss rated word familiarity of each lexical affiliate

222 citations


Journal ArticleDOI
TL;DR: This work investigates the usefulness of a number of textual features and additionalintonational features in predicting the location of one particular intonational feature—intonational phrase boundaries—in natural speech.

205 citations


Patent
08 Jul 1992
TL;DR: The database information retrieval system as mentioned in this paper includes a parser for parsing a natural language input query into constituent phrases with a analysis of the syntax of the phrase, and a retrieval execution unit for retrieving data from the database on the basis of the database retrieval formula produced by the collating unit.
Abstract: The database information retrieval system includes a parser for parsing a natural language input query into constituent phrases with a analysis of the syntax of the phrase. The parser may make use of tables and or dictionaries to aid in terminology identification and grammatical syntax analysis. The system also includes virtual tables for converting phrases from the natural language query into retrieval keys that are possessed by the database. The virtual tables account for particles or terms that modify the phrases in the natural language query. A collating unit is provided in the system for preparing a query or retrieval formula executable in the database from the retrieval keys provided by a virtual table the collating unit selects. Lastly, the system contains a retrieval execution unit for retrieving data from the database on the basis of the database retrieval formula produced by the collating unit.

199 citations


Book ChapterDOI
01 Jan 1992
TL;DR: According to the compilers of the BBI Combinatory Dictionary of English (1986:ix), there are many fixed, identifiable, non-idiomatic phrases and constructions as mentioned in this paper.
Abstract: According to Benson, Benson and Ilson, the compilers of the BBI Combinatory Dictionary of English (1986:ix), ‘In English as in any other language there are many fixed, identifiable, non-idiomatic phrases and constructions. Such groups of words are called recurrent combinations, fixed combinations or collocations.’ These authors give the example of murder and its collocates. On hearing the noun we immediately recall from memory the verb to commit and the phrase to commit a murder. It is only later that other verbs like to investigate, to describe or to witness appear. To commit a murder is a far more fixed collocation than those with the other three verbs. Relative fixedness is a characteristic of collocations. Another characteristic is their non-idiomaticity. Although idioms are another category of fixed sequences, their meaning is often non-combinatory, i.e. it cannot be decoded from the meanings of their constituents; on the contrary, the meaning of collocations is always transparent.

166 citations


Journal ArticleDOI
TL;DR: In this article, the authors provide cross-linguistic evidence for a functional projection between D and NP, which they call "number phrase" (NumP), which is the locus of number specification (singular or plural) of a noun phrase.
Abstract: In this paper I provide cross-linguistic evidence for a functional projection between D and NP, which I call “Number Phrase” (NumP). In a full noun phrase, the head of this projection is, among other things, the locus of number specification (singular or plural) of a noun phrase. Pronominal noun phrases are distinguished from full noun phrases by the fact that they lack a lexical projection, i.e., they lack a NP. The existence of two distinct functional categories predicts the existence of at least two classes of pronouns, those of the category D, and those of the category Num. In both Modern Hebrew and Haitian, there is evidence that this prediction is borne out.

153 citations


Journal ArticleDOI
TL;DR: The authors showed that context override of syntactic parsing preferences can override the preference to attach prepositional phrases high to the verb in sentences with postnominal preposition phrases (PPs).

101 citations


Journal ArticleDOI
TL;DR: The data support models of sentence parsing that postulate that the parsing of a sentence is based upon structurally based principles and the influence of semantic or pragmatic information makes itself felt only after the initial parsing decision has been made.

Journal ArticleDOI
TL;DR: This paper examined the influence of discourse information in sentence comprehension in two subject-paced reading experiments, and found that discourse information exerts its influence on sentence comprehension, but not on sentence reading.
Abstract: Two subject-paced reading experiments were carried out to examine the way in which discourse information exerts its influence in sentence comprehension

Proceedings Article
01 Jan 1992
TL;DR: This work tested the effectiveness of an existing program, for phrase extraction, FASIT, in the SMART retrieval environment, and found it to be effective in both the retrieval and indexing of phrases.
Abstract: In keeping with our interest in answering the simple question or whether syntactic phrase indexing seales up to large databases, we tested the effectiveness of an existing program, for phrase extraction, FASIT, in the SMART retrieval environment

Patent
30 Nov 1992
TL;DR: In this paper, a machine translation system reads the first language sentence and second language sentence which are mutually equivalent, analyzes the morphemes and phrases of the sentences, identifies the word correspondence between the first sentence and the second sentence with reference to the bilingual dictionary, and generates a translation template by replacing the corresponding words of the first and second sentences with variables.
Abstract: To automatically generate translation templates containing variables which can be replaced with various words or phrases from a bilingual pair of sentences, the machine translation system reads the first language sentence and second language sentence which are mutually equivalent, analyzes the morphemes and phrases of the sentences, identifies the word correspondence between the first language sentence and the second language sentence with reference to the bilingual dictionary, generates a translation template by replacing the corresponding words of the first language sentence and second language sentence with variables which are mutually correspondent, extracts the phrase correspondence between the first language sentence and the second language sentence, generates a generalized template wherein the corresponding phrases are replaced with variables, and generates a partial template wherein the corresponding phrases are separated. By doing this, a translation template can be learned (automatically generated) from bilingual pair of sentences, and high quality translation can be obtained.

Patent
13 Mar 1992
TL;DR: In this article, a method and apparatus for facilitating the contextual translation of textual entries within an interactive software application which is executable within a data processing system is presented, allowing contextual variations to be observed and compensated for.
Abstract: A method and apparatus for facilitating the contextual translation of textual entries within an interactive software application which is executable within a data processing system. A contextual translation procedure is created and inserted into a selected interactive software application. During execution of the selected interactive software application, display screens (18) containing textual entries (30) are typically generated. Upon encountering a screen requiring translation an operator may invoke the contextual translation procedure, causing a translation viewport (36) to be displayed. Individual translate phrases within the display screen are then selected and displayed within the translation viewport. A contextual translation may then be entered into the translation viewport by a translator and displayed therein in proximity to the selected translatable phrase. Upon completion of translation of a selected phrase, the newly created translation is substituted for the existing phrase within the display screen, automatically altering the display area, if necessary, and a subsequent translatable phrase may be selected. In this manner, translation may occur during actual execution of an interactive software application, permitting contextual variations to be observed and compensated for.

Journal ArticleDOI
TL;DR: A working model of the processes involved in the production of noun phrases is proposed, and two picture--word interference experiments showed interference effects from semantically related noun distractors, extending findings from production of single words to theproduction of phrases.

Patent
Edward Allen Fishel1
27 Mar 1992
TL;DR: In this article, a hypertext network is provided for use in an interactive data processing system including a processor device (12), a memory (14), a user input device (20) and a display (22).
Abstract: A hypertext network (28) is provided for use in an interactive data processing system including a processor device (12), a memory (14), a user input device (20) and a display (22). The hypertext network comprises a plurality of user selectable stored information modules. Each module includes a predefined descriptive header and at least some modules include a link reference phrase to another one of the plurality of user selectable modules. Responsive to a user selection entry, the associated user selectable module is displayed and a marker character is stored with each link reference phrase to the selected displayed module. The marker character is stored in an input field that is accessed by a user to select a particular link reference phrase and the reference phrase text is not defined as a user input field, so that the user cannot accidentally change the reference phrase. The stored marker characters are erased or set to blank characters when the hypertext network is exited so that all marker character input fields are blank each time access to the hypertext network is initiated.

Book ChapterDOI
01 May 1992
TL;DR: The authors showed that Japanese pitch downtrend is a phonetic process which occurs as a function of time, more or less independently of the linguistic structure of utterances (see e.g. Fujisaki and Sudo 1971a).
Abstract: Introduction One of the most significant findings about Japanese intonation in the past decade or so has been the existence of downstep . At least since the 1960s, the most widely accepted view had been that pitch downtrend is essentially a phonetic process which occurs as a function of time, more or less independently of the linguistic structure of utterances (see e.g. Fujisaki and Sudo 1971a). Against this view, Poser (1984) showed that downtrend in Japanese is primarily due to a downward pitch register shift (“catathesis” or “downstep”), which is triggered by (lexically given) accents of minor intonational phrases, and which occurs iteratively within the larger domain of the so-called major phrase. The validity of this phonological account of downtrend has subsequently been confirmed by Beckman and Pierrehumbert (Beckman and Pierrehumbert 1986; Pierrehumbert and Beckman 1988) and myself (Kubozono 1988a, 1989). Consider first the pair of examples in (1). uma'i nomi'mono “tasty drink” nomi'mono “sweet drink” The phrase in (la) consists of two lexically accented words while (lb) consists of an unaccented word (of which Tokyo Japanese has many) and an accented word. Downstep in Japanese looks like figures 15.1a and 15.2 (solid line), where an accented phrase causes the lowering of pitch register for subsequent phrases, accented and unaccented alike, in comparison with the sequences in which the first phrase is unaccented (i.e. figures 15.1b and 15.2, dotted line).

Journal ArticleDOI
TL;DR: LUKE is a tool that allows a knowledge base builder to create an English language interface by associating words and phrases with knowledge base entities by drawing its power from a large set of heuristics about how words are typically used to describe the world.
Abstract: Very large knowledge bases (KB's) constitute an important step for artificial intelligence and will have significant effects on the field of natural language processing. This thesis addresses the problem of effectively acquiring two large bodies of formalized knowledge: knowledge about the world (a KB), and knowledge about words (a lexicon). The central observation is that these two bodies of knowledge are highly redundant. For example, the syntactic behavior of a noun (or a verb) is highly correlated with certain physical properties of the object (or event) to which it refers. It should be possible to take advantage of this type of redundancy in order to greatly reduce both the time and expertise required to build large KB's and lexicons. This thesis describes LUKE, a software tool that allows a knowledge base builder to create an English language interface by associating words and phrases with KB entities. LUKE assumes no linguistic expertise on the part of the user, because that expertise is built directly into the tool itself. LUKE draws its power from a large set of heuristics about how words are typically used to describe the world. These heuristics exploit the redundancy between linguistic and world knowledge. When a word or phrase is associated with some KB entity, LUKE is able to accurately guess features of the word based on features of the KB entity. LUKE can also hypothesize new words and word senses based on the existence of others. All of LUKE's hypotheses are displayed to the user for verification, using a format designed to tap the user's basic linguistic intuitions. LUKE stores its lexicon in the KB. Truth maintenance links ensure that changes in the KB are automatically propagated to the lexicon. LUKE compiles lexical entries into data structures convenient for natural language parsing and generation programs. Lexicons acquired by LUKE have been used by KBNL, a knowledge-based natural language system, for applications in information retrieval, machine translation, and KB navigation. This work identifies several dozen heuristics that encode redundancies between linguistic representations and representations of world knowledge. It also demonstrates the usefulness of these heuristics in a working lexical acquisition system.

Patent
05 Oct 1992
TL;DR: In this paper, a control device selects appropriate phrases corresponding to the present vehicle position and reads out the selected phrases from the buffer, ( 4 - 1 ) successively, in response to a request signal supplied from a request sensor ( 7 ) which is output by the driver's request.
Abstract: A vehicle navigation apparatus in accordance with the present invention includes a CDROM ( 2 ) for storing all the data required for navigation processing such as maps and voice data, and a buffer ( 4 - 1 ) for accessing CDROM ( 2 ) to prepare beforehand a phrase necessary for the navigation guidance. A control device ( 1 ) selects appropriate phrases corresponding to the present vehicle position and reads out the selected phrases from the buffer, ( 4 - 1 ) successively, in response to a request signal supplied from a request sensor ( 7 ) which is output in response to the driver's request. The thus processed navigation information data is transformed into voice signals through a decoder ( 4 - 2 ) and output from a speaker ( 5 ) as a voice response. The control device ( 1 ) checks whether or not a phrase prepared in the buffer ( 4 - 1 ) is a frequently used phrase and replaces this phrase with a new phrase if this phrase is not frequently used. Furthermore, the control device ( 1 ) sets discriminating flags to the selected phrases and continuously reads out these phrases with the discriminating flags, in a predetermined order, in accordance with the driver's request, so as to execute voice response processing.

Journal ArticleDOI
TL;DR: This paper describes a process whereby a morpho-syntactic analysis of phrases or user queries is used to generate a structured representation of text to evaluate the effectiveness or quality of the matching and scoring of phrases.
Abstract: The application of automatic natural language processing techniques to the indexing and the retrieval of text information has been a target of information retrieval researchers for some time. Incorporating semantic-level processing of language into retrieval has led to conceptual information retrieval, which is effective but usually restricted in its domain. Using syntactic-level analysis is domain-independent, but has not yet yielded significant improvements in retrieval quality. This paper describes a process whereby a morpho-syntactic analysis of phrases or user queries is used to generate a structured representation of text. A process of matching these structured representations is then described that generates a metric value or score indicating the degree of match between phrases. This scoring can then be used for ranking the phrases. In order to evaluate the effectiveness or quality of the matching and scoring of phrases, some experiments are described that indicate the method to be quite useful. Ultimately the phrasematching technique described here would be used as part of an overall document retrieval strategy, and some future work towards this direction is outlined.

Patent
Hiroko Okuda1
29 Dec 1992
TL;DR: In this paper, an automatic melody composer composes a melody on a phrase by phrase basis using a phrase database storage, which stores data of many and various phrases, and a generating index database storage stores records of music generating index.
Abstract: An automatic melody composer composes a melody on a phrase by phrase basis. A phrase database storage stores data of many and various phrases. A generating index database storage stores records of music generating index. Each record describes or indicates appropriate phrases in the phase database and how melody is modified or developed in moving from one musical time unit to another. A decoder selects an index record from the generating index database, retrieves, from the phrase database, phrase components according to the index record, and combines them into a phrase. Thus, the composer quickly provides a natural melody formed with a chain of phrases without requiring complicated data processing.

Proceedings ArticleDOI
23 Aug 1992
TL;DR: A languageindependent approach to the derivation of pitch accent, based on focus-accent theory, which is implemented in the ESPRIT-pro jec t POLYGLOT program, and provides an integrated environment for modelling the syntactic interface of a multi-lingual textto-speech system.
Abstract: One of the problems that must be addressed by a textto-speech system is the derivation of pitch accent, marking the distinction between "given" and "new" information in an ut terance. This paper discusses a languageindependent approach to this problem, which is based on focus-accent theory (e.g. Ladd 1978, Gussenhoven 1984, t3aart t987), and implemented in my program P R o s a . This program has been developed as part of the ESPRIT-pro jec t POLYGLOT, and provides an integrated environment for modelling the syntaxto-prosody interface of a multi-lingual textto-speech system. The program operates in the following manner . First , the input text is parsed using a variation of context-free phrase-structure rules, at tgmented with information about "argument" s t ructure of phrases. Next, the syntactic representat ion is mapped onto a metrical tree. The metrical tree is then used to derive locations for pitch accents, as well as phonological and intonational phrase boundaries. in this approach, differences between law guages are modelled entirely by the syntactic rules. Also, the system is strictly declaratiw:, in the sense that once a piece of information is added by a rule, it is never removed. In this respect, our approach differs radically from systems which make use of derivational rules (e.g. Quend & Kager 1992). Such systems tend to become extremely complex, hard to verify and almost impossible to maintain or extend (Quenb & Dirksen 1990, Dirksen & Quen6 in press). By contrast , in PROS-3 there is a conspicuous relation between theory and implementat ion, attd the program can be extended in a number of ways ) Below, 1 will focus on two major rules from focus-accent theory: Default Accent and l/.hythn~ic Deaccenting. The tirst rule is used to model deaccenting of "given" information, e.g. the pronouns it, her and cs in the English, l)utch and German sentences of (1), (2) and (3), respectively.

Patent
08 Sep 1992
TL;DR: In this paper, a speech training system enables students to rapidly acquire and perfect their pronunciation of English phrases by speaking along with videos presenting English phrases accompanied by conventionally-spelled English text and characters representing the correct mouth positions.
Abstract: This speech training system enables students to rapidly acquire and perfect their pronunciation of English phrases by speaking along with videos presenting English phrases accompanied by conventionally-spelled English text and characters representing the correct mouth positions. Students learn the mouth positions, then listen to a phrase, speak it simultaneously following the mouth position characters, and read the text. Students of this method can compare their mouth movements to a model of standard pronunciation visually and auditorily. Students of English thus have a more reliable audiovisual means of learning and practicing correct English pronunciation.

Proceedings ArticleDOI
31 Mar 1992
TL;DR: An efficient algorithm for chart-based phrase structure parsing of natural language that is tailored to the problem of extracting specific information from unrestricted texts where many of the words are unknown and much of the text is irrelevant to the task is presented.
Abstract: We present an efficient algorithm for chart-based phrase structure parsing of natural language that is tailored to the problem of extracting specific information from unrestricted texts where many of the words are unknown and much of the text is irrelevant to the task. The parser gains algorithmic efficiency through a reduction of its search space. As each new edge is added to the chart, the algorithm checks only the topmost of the edges adjacent to it, rather than all such edges as in conventional treatments. The resulting spanning edges are insured to be the correct ones by carefully controlling the order in which edges are introduced so that every final constituent covers the longest possible span. This is facilitated through the use of phrase boundary heuristics based on the placement of function words, and by heuristic rules that permit certain kinds of phrases to be deduced despite the presence of unknown words. A further reduction in the search space is achieved by using semantic rather than syntactic categories on the terminal and nonterminal edges, thereby reducing the amount of ambiguity and thus the number of edges, since only edges with a valid semantic interpretation are ever introduced.

Proceedings ArticleDOI
23 Mar 1992
TL;DR: Two approaches for adapting a specific syllable trigram model to a new task are described, one uses a small amount of text data similar to the target task, and the other uses supervised learning using the most recent input phrases.
Abstract: The authors describe two approaches for adapting a specific syllable trigram model to a new task. One uses a small amount of text data similar to the target task, and the other uses supervised learning using the most recent input phrases. The effect of each adaptation is verified with syllable perplexity and phrase recognition. Where the syntactic knowledge was only the syllable trigram model, the perplexity was reduced from 54.5 to 18.1 for the adaptation using 100 phrases of similar text, and was reduced to 14.6 by the supervised learning. The recognition rates were also improved from 42.3% to 46.6% and 50.9%, respectively. Text similarity for speech recognition is also studied. >

01 Jan 1992
TL;DR: Two statistical methods of applying syntactic constraints to the output of a handwritten word recognizer on input consisting of sentences/phrases based on syntactic categories associated with words are discussed.
Abstract: The output of handwritten word recognizers tends to be very noisy due to factors such as variable handwriting styles, distortions in the image data, etc. In order to compensate for this behaviour, several choices of the word recognizer are initially considered but eventually reduced to a single choice based on constraints posed by the particular domain. In the case of handwritten sentence/phrase recognition, linguistic constraints may be applied in order to improve the results of the word recognizer. Linguistic constraints can be applied as (i) a purely post-processing operation or (ii) in a feedback loop to the word recognizer. This paper discusses two statistical methods of applying syntactic constraints to the output of a handwritten word recognizer on input consisting of sentences/phrases. Both methods are based on syntactic categories (tags) associated with words. The first is a purely statistical method, the second is a hybrid method which combines higherlevel syntactic information (hypertags) with statistical information regarding transitions between hypertags. We show the utility of both these approaches in the problem of handwritten sentence/phrase recognition.

Journal ArticleDOI
TL;DR: College readers read and answered questions on 12 short essays formatted so that points between phrases had fractional extra space added to them were comprehended better than normally formatted text.
Abstract: College readers read and answered questions on 12 short essays. Essays formatted so that points between phrases had fractional extra space added to them were comprehended better than normally formatted text. These improvements were specific to average readers. Practically, the results justify classroom research on the benefits of phrase-sensitive formatting; theoretically, the results add to existing evidence that poor to average readers specifically lack perceptual strategies for grouping word sequences into phrases.

Journal ArticleDOI
TL;DR: To determine whether sparrows use the ratio rather than the difference to identify ascending songs, Transposed a low-frequency song upward in frequency, maintaining a normal frequency difference but altering the frequency ratio, and transposed a high- frequency song downward in the same way.
Abstract: In the ascending songs of white-throated sparrows (Zonotrichia albicollis), the first note (Phrase 1) is sung lower than the remaining notes (Phrase 2). The production of this frequency change is more reliably predicted by the frequency ratio (Phrase 2 - Phrase 1) than by the frequency difference (Phrase 2 - Phrase 1). To determine whether sparrows use the ratio rather than the difference to identify ascending songs, we transposed a low-frequency song upward in frequency, maintaining a normal frequency difference but altering the frequency ratio, and we transposed a high-frequency song downward in the same way

Journal ArticleDOI
TL;DR: Sentence length as well as lexical and syntactic complexity were manipulated in two sentence repetition experiments and showed how linguistic complexity affects sentence memory and implied that the amnesic deficit did not involve a generalized difficulty for materials of similar length, rather, the deficit was specific to certain representational types and processing routines.