Showing papers in &quot;Computational Linguistics in 1998&quot;

Introduction to the special issue on word sense disambiguation: the state of the art

TL;DR: This paper presents context-group discrimination, a disambiguation algorithm based on clustering that demonstrates good performance of context- group discrimination for a sample of natural and artificial ambiguous words.

...read moreread less

Abstract: This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on second-order co-occurrence: two tokens (or contexts) of the ambiguous word are assigned to the same sense cluster if the words they co-occur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from a corpus without labeled training instances or other external knowledge sources. The paper demonstrates good performance of context-group discrimination for a sample of natural and artificial ambiguous words.

...read moreread less

1,382 citations

Journal Article•

[...]

Nancy Ide¹, Jean Véronis²•Institutions (2)

Vassar College¹, University of Provence²

Using corpus statistics and WordNet relations for sense identification

TL;DR: In this paper, les As. font ici le point sur l'etat de la recherche dans ce domaine depuis ces 50 dernieres annees and considerent les prochaines etapes a franchir.

...read moreread less

Abstract: Compte tenu du progres effectue recemment dans le domaine de la desambiguisation du sens des mots, les As. font ici le point sur l'etat de la recherche dans ce domaine depuis ces 50 dernieres annees et considerent les prochaines etapes a franchir. Dans un 1 e r temps, ils passent en revue les principales approches de la desambiguisation du sens des mots : des premieres tentatives effectuees dans le cadre de la traduction automatique aux methodes actuelles basees sur les corpus, en passant par les methodes basees sur l'intelligence artificielle et les methodes utilisant des bases de connaissance. Dans un 2 n d temps, ils examinent les problemes laisses en suspens (le role du contexte, la division des sens, l'evaluation des resultats) et proposent quelques orientations pour la recherche future

...read moreread less

1,021 citations

Journal Article•DOI•

[...]

Claudia Leacock, George A. Miller¹, Martin Chodorow²•Institutions (2)

Princeton University¹, Hunter College²

University of Southern California¹

TL;DR: A statistical classifier is described that combines topical context with local cues to identify a word sense and is used to disambiguate a noun, a verb, and an adjective.

...read moreread less

Abstract: Corpus-based approaches to word sense identification have flexibility and generality but suffer from a knowledge acquisition bottleneck. We show how knowledge-based techniques can be used to open the bottleneck by automatically locating training corpora. We describe a statistical classifier that combines topical context with local cues to identify a word sense. The classifier is used to disambiguate a noun, a verb, and an adjective. A knowledge base in the form of WordNet's lexical relations is used to automatically locate training examples in a general text corpus. Test results are compared with those from manually tagged training examples.

...read moreread less

517 citations

Journal Article•

Machine transliteration

[...]

Kevin Knight¹, Jonathan Graehl•Institutions (1)

PCFG models of linguistic tree representations

TL;DR: This paper used a generative model to perform backwards transliteration from Japanese back to English, and evaluated a method for performing backward transliterations by machine, incorporating several distinct stages in the translation process.

...read moreread less

Abstract: It is challenging to translate names and technical terms across languages with different alphabets and sound inventories. These items are commonly trnasliterated, i.e., replaced with approximate phonetic equivalents. For example, "computer" in English comes out as "konpyuutaa" in Japanese. Translating such items from Japanese back to English is even more challenging, and of practical interest, as transliterated items make up the bulk of text phrases not found in bilingual dictionaries. We describe and evaluate a method for performing backwards transliterations by machine. This method uses a generative model, incorporating several distinct stages in the transliteration process.

...read moreread less

438 citations

Journal Article•

[...]

Mark Johnson¹•Institutions (1)

Brown University¹

A corpus-based investigation of definite description use

TL;DR: A simple node relabeling transformation is described that improves a treebank PCFG-based parser's average precision and recall by around 8%, or approximately half of the performance difference between a simple PCFG model and the best broad-coverage parsers available today.

...read moreread less

Abstract: The kinds of tree representations used in a treebank corpus can have a dramatic effect on performance of a parser based on the PCFG estimated from that corpus, causing the estimated likelihood of a tree to differ substantially from its frequency in the training corpus. This paper points out that the Penn II treebank representations are of the kind predicted to have such an effect, and describes a simple node relabeling transformation that improves a treebank PCFG-based parser's average precision and recall by around 8%, or approximately half of the performance difference between a simple PCFG model and the best broad-coverage parsers available today. This performance variation comes about because any PCFG, and hence the corpus of trees from which the PCFG is induced, embodies independence assumptions about the distribution of words and phrases. The particular independence assumptions implicit in a tree representation can be studied theoretically and investigated empirically by means of a tree transformation / detransformation process.

...read moreread less

407 citations

Journal Article•

[...]

Massimo Poesio¹, Renata Vieira¹•Institutions (1)

University of Edinburgh¹

A collaborative planning model of intentional structure

TL;DR: Questions are raised concerning the starategy of evaluating systems for definite description interpretation by comparing their results with a standardized annotation, and the great number of discourse-new definites and the presence of definites that did not seem to require a complete disambiguation.

...read moreread less

Abstract: We present the results of a study of the use of definite descriptions in written texts aimed at assessing the feasibility of annotating corpora with information about definite description interpretation. We ran two experiments, in which subjects were asked to classify the uses of definite descriptions in a corpus of 33 newspaper articles, containing a total of 1,412 definite descriptions. We measured the agreement among annotators about the classes assigned to definite descriptions, as well as the agreement about the antecedent assigned to those definites that the annotators classified as being related to an antecedent in the text. The most interesting result of this study from a corpus annotation perspective was the rather low agreement (K = 0.63) that we obtained using versions of Hawkins's and Prince's classification schemes; better results (K = 0.76) were obtained using the simplified scheme proposed by Fraurud that includes only two classes, first-mention and subsequent-mention. The agreement about antecedents was also not complete. These findings raise questions concerning the starategy of evaluating systems for definite description interpretation by comparing their results with a standardized annotation. From a linguistic point of view, the most interesting observations were the great number of discourse-new definites in our corpus (in one of our experiments, about 50% of the definites in the collection were classified as discourse-new, 30% as anaphoric, and 18% as associative/bridging) and the presence of definites that did not seem to require a complete disambiguation.

...read moreread less

310 citations

Journal Article•DOI•

[...]

Karen E. Lochbaum

Generalizing case frames using a thesaurus and the MDL principle

TL;DR: A computational model for recognizing intentional structure and utilizing it in discourse processing is provided based on the collaborative planning framework of SharedPlans (Grosz and Kraus 1996).

...read moreread less

Abstract: An agent's ability to understand an utterance depends upon its ability to relate that utterance to the preceding discourse. The agent must determine whether the utterance begins a new segment of the discourse, completes the current segment, or contributes to it. The intentional structure of the discourse, comprised of discourse segment purposes and their interrelationships, plays a central role in this process (Grosz and Sidner 1986). In this paper, we provide a computational model for recognizing intentional structure and utilizing it in discourse processing. The model is based on the collaborative planning framework of SharedPlans (Grosz and Kraus 1996).

...read moreread less

221 citations

Journal Article•

[...]

Hang Li¹, Naoki Abe¹•Institutions (1)

NEC¹

Weizmann Institute of Science¹, Massachusetts Institute of Technology²

TL;DR: In this paper, the problem of generalizing values of a case frame slot for a verb is viewed as that of estimating a conditional probability distribution over a partition of words, and a new generalization method based on the minimum description length (MDL) principle is proposed.

...read moreread less

Abstract: A new method for automatically acquiring case frame patterns from large corpora is proposed. In particular, the problem of generalizing values of a case frame slot for a verb is viewed as that of estimating a conditional probability distribution over a partition of words, and a new generalization method based on the Minimum Description Length (MDL) principle is proposed. In order to assist with efficiency, the proposed method makes use of an existing thesaurus and restricts its attention to those partitions that are present as "cuts" in the thesaurus tree, thus reducing the generalization problem to that of estimating a "tree cut model" of the thesaurus tree. An efficient algorithm is given, which provably obtains the optimal tree cut model for the given frequency data of a case slot, in the sense of MDL. Case frame patterns obtained by the method were used to resolve PP-attachment ambiguity. Experimental results indicate that the proposed method improves upon or is at least comparable with existing methods.

...read moreread less

179 citations

Journal Article•

[...]

Yael Karov¹, Shimon Edelman²•Institutions (2)

Estimation of probabilistic context-free grammars

TL;DR: Experiments show that this method can learn even from very sparse training data, achieving over 92% correct disambiguation performance.

...read moreread less

Abstract: We describe a method for automatic word sense disambiguation using a text corpus and a machine-readble dictionary (MRD). The method is based on word similarity and context similarity measures. Words are considered similar if they appear in similar contexts; contexts are similar if they contain similar words. The circularity of this definition is resolved by an iterative, converging process, in which the system learns from the corpus a set of typical usages for each of the senses of the polysemous word listed in the MRD. A new instance of a polysemous word is assigned the sense associated with the typical usage most similar to its context. Experiments show that this method can learn even from very sparse training data, achieving over 92% correct disambiguation performance.

...read moreread less

144 citations

Journal Article•

[...]

Zhiyi Chi¹, Stuart Geman¹•Institutions (1)

Brown University¹

Optimality theory and the generative complexity of constraint violability

TL;DR: Estimated production probabilities always yield proper distributions, which means that an estimated system of probabilities from parsed or unparsed sentences is automatically proper.

...read moreread less

Abstract: The assignment of probabilities to the productions of a context-free grammar may generate an improper distribution: the probability of all finite parse trees is less than one. The condition for proper assignment is rather subtle. Production probabilities can be estimated from parsed or unparsed sentences, and the question arises as to whether or not an estimated system is automatically proper. We show here that estimated production probabilities always yield proper distributions.

...read moreread less

127 citations

Journal Article•

[...]

Robert Frank¹, Giorgio Satta²•Institutions (2)

Johns Hopkins University¹, University of Padua²

New figures of merit for best-first probabilistic chart parsing

TL;DR: It is shown that the conditions under which the phonological descriptions that are possible within the view of constraint interaction embodied in Optimality Theory remain within the class of rational relations are correct when GEN is itself a rational relation.

...read moreread less

Abstract: It has been argued that rule-based phonological descriptions can uniformaly be expressed as mappings carried out by finite-state transducers, and therefore fall within the class of rational relations. If this property of generative capacity is an empirically correct characterization of phonological mappings, it should hold of any sufficiently restrictive theory of phonology, whether it utilizes constraints or rewrite rules. In this paper, we investigate the conditions under which the phonological descriptions that are possible within the view of constraint interaction embodied in Optimality Theory (Prince and Smolensky 1993) remain within the class of rational relations. We show that this is true when GEN is itself a rational relation, and each of the constraints distinguishes among finitely many regular sets of candidates.

...read moreread less

Journal Article•

[...]

Sharon A. Caraballo¹, Eugene Charniak¹•Institutions (1)

Brown University¹

Selective sampling for example-based word sense disambiguation

TL;DR: This work proposes and evaluates several figures of merit for best-first parsing, and identifies an easily computable figure of merit that provides excellent performance on various measures and two different grammars.

...read moreread less

Abstract: Best-first parsing methods for natural language try to parse efficiently by considering the most likely constituents first. Some figure of merit is needed by which to compare the likelihood of constituents, and the choice of this figure has a substantial impact on the efficiency of the parser. While several parsers described in the literature have used such techniques, there is little published data on their efficacy, much less attempts to judge their relative merits. We propose and evaluate several figures of merit for best-first parsing, and we identify an easily computable figure of merit that provides excellent performance on various measures and two different grammars.

...read moreread less

Journal Article•

[...]

Atsushi Fujii, Takenobu Tokunaga¹, Kentaro Inui², Hozumi Tanaka¹•Institutions (2)

Tokyo Institute of Technology¹, Kyushu Institute of Technology²

Disambiguating highly ambiguous words

TL;DR: The authors proposed an example sampling method for example-based word sense disambiguation, which is characterized by the reliance on the notion of training utility: the degree to which each example is informative for future example sampling when used for the training of the system.

...read moreread less

Abstract: This paper proposes an efficient example sampling method for example-based word sense disambiguation systems To construct a database of practical size, a considerable overhead for manual sense disambiguation (overhead for supervision) is required In addition, the time complexity of searching a large-sized database poses a considerable problem (overhead for search) To counter these problems, our method selectively samples a smaller-sized effective subset from a given example set for use in word sense disambiguation Our method is characterized by the reliance on the notion of training utility: the degree to which each example is informative for future example sampling when used for the training of the system The system progressively collects examples by selecting those with greatest utility The paper reports the effectiveness of our method through experiments on about one thousand sentences Compared to experiments with other example sampling methods, our method reduced both the overhead for supervision and the overhead for search, without the degeneration of the performance of the system

...read moreread less

Journal Article•DOI•

[...]

Geoffrey G Towell¹, Ellen M. Voorhees¹•Institutions (1)

Siemens¹

Topical clustering of MRD senses based on information retrieval techniques

TL;DR: A classifier whose accuracy may be sufficient for word sense disambiguation and an extension of the training method that uses information extracted from unlabeled examples to improve classification accuracy when there are few labeled training examples available.

...read moreread less

Abstract: A word sense disambiguator that is able to distinguish among the many senses of common words that are found in general-purpose, broad-coverage lexicons would be useful. For example, experiments have shown that, given accurate sense disambiguation, the lexical relations encoded in lexicons such as WordNet can be exploited to improve the effectiveness of information retrieval systems. This paper describes a classifier whose accuracy may be sufficient for such a purpose. The classifier combines the output of a neural network that learns topical context with the output of a network that learns local context to distinguish among the senses of highly ambiguous words.The accuracy of the classifier is tested on three words, the noun line, the verb serve, and the adjective hard; the classifier has an average accuracy of 87%, 90%, and 81%, respectively, when forced to choose a sense for all test cases. When the classifier is not forced to choose a sense and is trained on a subset of the available senses, it rejects test cases containing unknown senses as well as test cases it would misclassify if forced to select a sense. Finally, when there are few labeled training examples available, we describe an extension of our training method that uses information extracted from unlabeled examples to improve classification accuracy.

...read moreread less

Journal Article•DOI•

[...]

Jen Nan Chen¹, Jason S. Chang¹•Institutions (1)

National Tsing Hua University¹

Contextual grammars as generative models of natural languages

TL;DR: A heuristic approach capable of automatically clustering senses in a machine-readable dictionary (MRD) and an implementation of the method for clustering definition sentences in the Longman Dictionary of Contemporary English (LDOCE) is described.

...read moreread less

Abstract: This paper describes a heuristic approach capable of automatically clustering senses in a machine-readable dictionary (MRD). Including these clusters in the MRD-based lexical database offers several positive benefits for word sense disambiguation (WSD). First, the clusters can be used as a coarser sense division, so unnecessarily fine sense distinction can be avoided. The clustered entries in the MRD can also be used as materials for supervised training to develop a WSD system. Furthermore, if the algorithm is run on several MRDs, the clusters also provide a means of linking different senses across multiple MRDs to create an integrated lexical database. An implementation of the method for clustering definition sentences in the Longman Dictionary of Contemporary English (LDOCE) is described. To this end, the topical word lists and topical cross-references in the Longman Lexicon of Contemporary English (LLOCE) are used. Nearly half of the senses in the LDOCE can be linked precisely to a relevant LLOCE topic using a simple heuristic. With the definitions of senses linked to the same topic viewed as a document, topical clustering of the MRD senses bears a striking resemblance to retrieval of relevant documents for a given query in information retrieval (IR) research. Relatively well-established IR techniques of weighting terms and ranking document relevancy are applied to find the topical clusters that are most relevant to the definition of each MRD sense. Finally, we describe an implemented version of the algorithms for the LDOCE and the LLOCE and assess the performance of the proposed approach in a series of experiments and evaluations.

...read moreread less

Journal Article•

[...]

Solomon Marcus¹, Gheorghe Paun², Carlos Martín-Vide³•Institutions (3)

University of Bucharest¹, Romanian Academy², Rovira i Virgili University³

A corpus-based investigation of definite description use

TL;DR: The paper discusses some classes of contextual grammars---mainly those with "maximal use of selectors"---giving some arguments that these Grammars can be considered a good model for natural language syntax, and some ideas for associating a structure to the generated words, in the form of a tree, or of a dependence relation.

...read moreread less

Abstract: The paper discusses some classes of contextual grammars---mainly those with "maximal use of selectors"---giving some arguments that these grammars can be considered a good model for natural language syntax.A contextual grammar produces a language starting from a finite set of words and interatively adding contexts to the currently generated words, according to a selection procedure: each context has associated with it a selector, a set of words; the context is adjoined to any occurrence of such a selector in the word to be derived. In grammars with maximal use of selectors, a context is adjoined only to selectros for which no superword is a selector. Maximality can be defined either locally or globally (with respect to all selectors in the grammar). The obtained families of languages are incomparable with that of Chomsky context-free languages (and with other families of languages that contain linear languages and that are not "too large"; see Section 5) and have a series of properties supporting the assertion that these grammars are a possible adequate model for the syntax of natural languages. They are able to straightforwardly describe all the usual restrictions appearing in natural (and artificial) languages, which lead to the non-context-freeness of these languages: reduplication, crossed dependencies, and multiple agreements; however, there are center-embedded constructions that cannot be covered by these grammars.While these assertions concern only the weak generative capacity of contextual grammars, some ideas are also proposed for associating a structure to the generated words, in the form of a tree, or of a dependence relation (as considered in descrpitive linguistics and also similar to that in link grammars).

...read moreread less

Journal Article•DOI•

[...]

PoesioMassimo, VieiraRenata

Automatic word sense discrimination

TL;DR: A study of the use of definite descriptions in written texts aimed at assessing the feasibility of annotating corpora with information about definite description interpretings finds that the former is more likely than the latter.

...read moreread less

Journal Article•DOI•

[...]

SchützeHinrich

Review of Language at work: analyzing communication breakdown in the workplace to inform systems design by Keith Devlin and Duska Rosenberg. CSLI Publications 1996.

TL;DR: This paper presents context-group discrimination, a disambiguation algorithm based on clustering, which indicates that senses are interpreted as groups of similar contexts of the ambiguous word.

...read moreread less

Journal Article•

[...]

John F. Sowa

Describing complex charts in natural language

Journal Article•DOI•

[...]

O MittalVibhu, CareniniGiuseppe, D MooreJohanna, RothSteven

01 Sep 1998-Computational Linguistics

TL;DR: This research presents a meta-modelling architecture that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and presenting graphical presentations.

...read moreread less

Abstract: Graphical presentations can be used to communicate information in relational data sets succinctly and effectively. However, novel graphical presentations that represent many attributes and relation...

...read moreread less

Journal Article•

Review of Evaluating natural language processing systems: an analysis and review by Karen Sparck Jones and Julia R. Galliers. Springer-Verlag 1995.

[...]

Sharon M. Walter¹•Institutions (1)

Air Force Research Laboratory¹

PCFG models of linguistic tree representations

Journal Article•DOI•

[...]

JohnsonMark

Selective sampling for example-based word sense disambiguation

TL;DR: The kinds of tree representations used in a treebank corpus can have a dramatic effect on performance of a parser based on the PCFG estimated from that corpus, causing the estimated likelihood of a parsers based on that corpus to change.

...read moreread less

Journal Article•DOI•

[...]

FujiiAtsushi, TokunagaTakenobu, InuiKentaro, TanakaHozumi

Review of From grammar to science: new foundations for general linguistics by Victor H. Yngve. John Benjamins Publishing Company 1996.

TL;DR: In this paper, an example sampling method for example-based word sense disambiguation systems is proposed, which can be used to construct a database of practical size, a considerable overhead for manual sense dis...

...read moreread less

Abstract: This paper proposes an efficient example sampling method for example-based word sense disambiguation systems. To construct a database of practical size, a considerable overhead for manual sense dis...

...read moreread less

Journal Article•

[...]

Geoffrey Sampson¹•Institutions (1)

University of Sussex¹

Clues from the depth hypothesis: a reply to Geoffrey Sampson's review

Journal Article•

[...]

Victor H. Yngve¹•Institutions (1)

University of Chicago¹

Review of Computational and conversational discourse: burning issues--an interdisciplinary account by Eduard H. Hovy and Donia R. Scott. Springer-Verlag 1996.

TL;DR: In linguistics it has not been possible to use the standard criteria and assumptions of science because the ancients placed linguistics not in the physical domain but in the logical domain where concepts and theories do not represent parts of the natural world.

...read moreread less

Abstract: In linguistics it has not been possible to use the standard criteria and assumptions of science because the ancients placed our discipline not in the physical domain but in the logical domain where concepts and theories do not represent parts of the natural world. Many of the problems facing linguistics follow inevitablly, for example the difficulties that linguistics experiences in agreeing on grammatical theory. One symptom is the long-standing dificulty in testing the depth hypothesis, which came out of early MT research. Sampson (1997) attempted recently to test the depth hypothesis by a computer analysis of a grammatically annotated corpus of English. It is shown that this attempted test and his attempt at defending the testability of the depth hypothesis are invalid. But clues from the depth hypothesis have led to new foundations for general linguistics put forth in the book (Yngve 1996) that Sampson (1998) reviewed. This work reconstitutes linguistics in the physical domain where the criteria and assumptions of science can be applied. Sampson's review of this book contains a number of serious errors and inaccuracies.

...read moreread less

Journal Article•

[...]

Andrew Kehler¹•Institutions (1)

SRI International¹

Review of Text databases: one database model and several retrieval languages by Crist-Jan Doedens. Editions Rodopi 1994.

Journal Article•

[...]

Nancy Ide¹•Institutions (1)

Vassar College¹

Estimation of probabilistic context-free grammars

Journal Article•DOI•

[...]

ChiZhiyi, GemanStuart