scispace - formally typeset
Search or ask a question

Showing papers in "Computational Linguistics in 1998"


Journal Article
Hinrich Schütze1
TL;DR: This paper presents context-group discrimination, a disambiguation algorithm based on clustering that demonstrates good performance of context- group discrimination for a sample of natural and artificial ambiguous words.
Abstract: This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on second-order co-occurrence: two tokens (or contexts) of the ambiguous word are assigned to the same sense cluster if the words they co-occur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from a corpus without labeled training instances or other external knowledge sources. The paper demonstrates good performance of context-group discrimination for a sample of natural and artificial ambiguous words.

1,382 citations


Journal Article
TL;DR: In this paper, les As. font ici le point sur l'etat de la recherche dans ce domaine depuis ces 50 dernieres annees and considerent les prochaines etapes a franchir.
Abstract: Compte tenu du progres effectue recemment dans le domaine de la desambiguisation du sens des mots, les As. font ici le point sur l'etat de la recherche dans ce domaine depuis ces 50 dernieres annees et considerent les prochaines etapes a franchir. Dans un 1 e r temps, ils passent en revue les principales approches de la desambiguisation du sens des mots : des premieres tentatives effectuees dans le cadre de la traduction automatique aux methodes actuelles basees sur les corpus, en passant par les methodes basees sur l'intelligence artificielle et les methodes utilisant des bases de connaissance. Dans un 2 n d temps, ils examinent les problemes laisses en suspens (le role du contexte, la division des sens, l'evaluation des resultats) et proposent quelques orientations pour la recherche future

1,021 citations


Journal ArticleDOI
TL;DR: A statistical classifier is described that combines topical context with local cues to identify a word sense and is used to disambiguate a noun, a verb, and an adjective.
Abstract: Corpus-based approaches to word sense identification have flexibility and generality but suffer from a knowledge acquisition bottleneck. We show how knowledge-based techniques can be used to open the bottleneck by automatically locating training corpora. We describe a statistical classifier that combines topical context with local cues to identify a word sense. The classifier is used to disambiguate a noun, a verb, and an adjective. A knowledge base in the form of WordNet's lexical relations is used to automatically locate training examples in a general text corpus. Test results are compared with those from manually tagged training examples.

517 citations


Journal Article
TL;DR: This paper used a generative model to perform backwards transliteration from Japanese back to English, and evaluated a method for performing backward transliterations by machine, incorporating several distinct stages in the translation process.
Abstract: It is challenging to translate names and technical terms across languages with different alphabets and sound inventories. These items are commonly trnasliterated, i.e., replaced with approximate phonetic equivalents. For example, "computer" in English comes out as "konpyuutaa" in Japanese. Translating such items from Japanese back to English is even more challenging, and of practical interest, as transliterated items make up the bulk of text phrases not found in bilingual dictionaries. We describe and evaluate a method for performing backwards transliterations by machine. This method uses a generative model, incorporating several distinct stages in the transliteration process.

438 citations


Journal Article
Mark Johnson1
TL;DR: A simple node relabeling transformation is described that improves a treebank PCFG-based parser's average precision and recall by around 8%, or approximately half of the performance difference between a simple PCFG model and the best broad-coverage parsers available today.
Abstract: The kinds of tree representations used in a treebank corpus can have a dramatic effect on performance of a parser based on the PCFG estimated from that corpus, causing the estimated likelihood of a tree to differ substantially from its frequency in the training corpus. This paper points out that the Penn II treebank representations are of the kind predicted to have such an effect, and describes a simple node relabeling transformation that improves a treebank PCFG-based parser's average precision and recall by around 8%, or approximately half of the performance difference between a simple PCFG model and the best broad-coverage parsers available today. This performance variation comes about because any PCFG, and hence the corpus of trees from which the PCFG is induced, embodies independence assumptions about the distribution of words and phrases. The particular independence assumptions implicit in a tree representation can be studied theoretically and investigated empirically by means of a tree transformation / detransformation process.

407 citations


Journal Article
TL;DR: Questions are raised concerning the starategy of evaluating systems for definite description interpretation by comparing their results with a standardized annotation, and the great number of discourse-new definites and the presence of definites that did not seem to require a complete disambiguation.
Abstract: We present the results of a study of the use of definite descriptions in written texts aimed at assessing the feasibility of annotating corpora with information about definite description interpretation. We ran two experiments, in which subjects were asked to classify the uses of definite descriptions in a corpus of 33 newspaper articles, containing a total of 1,412 definite descriptions. We measured the agreement among annotators about the classes assigned to definite descriptions, as well as the agreement about the antecedent assigned to those definites that the annotators classified as being related to an antecedent in the text. The most interesting result of this study from a corpus annotation perspective was the rather low agreement (K = 0.63) that we obtained using versions of Hawkins's and Prince's classification schemes; better results (K = 0.76) were obtained using the simplified scheme proposed by Fraurud that includes only two classes, first-mention and subsequent-mention. The agreement about antecedents was also not complete. These findings raise questions concerning the starategy of evaluating systems for definite description interpretation by comparing their results with a standardized annotation. From a linguistic point of view, the most interesting observations were the great number of discourse-new definites in our corpus (in one of our experiments, about 50% of the definites in the collection were classified as discourse-new, 30% as anaphoric, and 18% as associative/bridging) and the presence of definites that did not seem to require a complete disambiguation.

310 citations


Journal ArticleDOI
TL;DR: A computational model for recognizing intentional structure and utilizing it in discourse processing is provided based on the collaborative planning framework of SharedPlans (Grosz and Kraus 1996).
Abstract: An agent's ability to understand an utterance depends upon its ability to relate that utterance to the preceding discourse. The agent must determine whether the utterance begins a new segment of the discourse, completes the current segment, or contributes to it. The intentional structure of the discourse, comprised of discourse segment purposes and their interrelationships, plays a central role in this process (Grosz and Sidner 1986). In this paper, we provide a computational model for recognizing intentional structure and utilizing it in discourse processing. The model is based on the collaborative planning framework of SharedPlans (Grosz and Kraus 1996).

221 citations


Journal Article
Hang Li1, Naoki Abe1
TL;DR: In this paper, the problem of generalizing values of a case frame slot for a verb is viewed as that of estimating a conditional probability distribution over a partition of words, and a new generalization method based on the minimum description length (MDL) principle is proposed.
Abstract: A new method for automatically acquiring case frame patterns from large corpora is proposed. In particular, the problem of generalizing values of a case frame slot for a verb is viewed as that of estimating a conditional probability distribution over a partition of words, and a new generalization method based on the Minimum Description Length (MDL) principle is proposed. In order to assist with efficiency, the proposed method makes use of an existing thesaurus and restricts its attention to those partitions that are present as "cuts" in the thesaurus tree, thus reducing the generalization problem to that of estimating a "tree cut model" of the thesaurus tree. An efficient algorithm is given, which provably obtains the optimal tree cut model for the given frequency data of a case slot, in the sense of MDL. Case frame patterns obtained by the method were used to resolve PP-attachment ambiguity. Experimental results indicate that the proposed method improves upon or is at least comparable with existing methods.

179 citations


Journal Article
TL;DR: Experiments show that this method can learn even from very sparse training data, achieving over 92% correct disambiguation performance.
Abstract: We describe a method for automatic word sense disambiguation using a text corpus and a machine-readble dictionary (MRD). The method is based on word similarity and context similarity measures. Words are considered similar if they appear in similar contexts; contexts are similar if they contain similar words. The circularity of this definition is resolved by an iterative, converging process, in which the system learns from the corpus a set of typical usages for each of the senses of the polysemous word listed in the MRD. A new instance of a polysemous word is assigned the sense associated with the typical usage most similar to its context. Experiments show that this method can learn even from very sparse training data, achieving over 92% correct disambiguation performance.

144 citations


Journal Article
TL;DR: Estimated production probabilities always yield proper distributions, which means that an estimated system of probabilities from parsed or unparsed sentences is automatically proper.
Abstract: The assignment of probabilities to the productions of a context-free grammar may generate an improper distribution: the probability of all finite parse trees is less than one. The condition for proper assignment is rather subtle. Production probabilities can be estimated from parsed or unparsed sentences, and the question arises as to whether or not an estimated system is automatically proper. We show here that estimated production probabilities always yield proper distributions.

127 citations


Journal Article
TL;DR: It is shown that the conditions under which the phonological descriptions that are possible within the view of constraint interaction embodied in Optimality Theory remain within the class of rational relations are correct when GEN is itself a rational relation.
Abstract: It has been argued that rule-based phonological descriptions can uniformaly be expressed as mappings carried out by finite-state transducers, and therefore fall within the class of rational relations. If this property of generative capacity is an empirically correct characterization of phonological mappings, it should hold of any sufficiently restrictive theory of phonology, whether it utilizes constraints or rewrite rules. In this paper, we investigate the conditions under which the phonological descriptions that are possible within the view of constraint interaction embodied in Optimality Theory (Prince and Smolensky 1993) remain within the class of rational relations. We show that this is true when GEN is itself a rational relation, and each of the constraints distinguishes among finitely many regular sets of candidates.

Journal Article
TL;DR: This work proposes and evaluates several figures of merit for best-first parsing, and identifies an easily computable figure of merit that provides excellent performance on various measures and two different grammars.
Abstract: Best-first parsing methods for natural language try to parse efficiently by considering the most likely constituents first. Some figure of merit is needed by which to compare the likelihood of constituents, and the choice of this figure has a substantial impact on the efficiency of the parser. While several parsers described in the literature have used such techniques, there is little published data on their efficacy, much less attempts to judge their relative merits. We propose and evaluate several figures of merit for best-first parsing, and we identify an easily computable figure of merit that provides excellent performance on various measures and two different grammars.

Journal Article
TL;DR: The authors proposed an example sampling method for example-based word sense disambiguation, which is characterized by the reliance on the notion of training utility: the degree to which each example is informative for future example sampling when used for the training of the system.
Abstract: This paper proposes an efficient example sampling method for example-based word sense disambiguation systems To construct a database of practical size, a considerable overhead for manual sense disambiguation (overhead for supervision) is required In addition, the time complexity of searching a large-sized database poses a considerable problem (overhead for search) To counter these problems, our method selectively samples a smaller-sized effective subset from a given example set for use in word sense disambiguation Our method is characterized by the reliance on the notion of training utility: the degree to which each example is informative for future example sampling when used for the training of the system The system progressively collects examples by selecting those with greatest utility The paper reports the effectiveness of our method through experiments on about one thousand sentences Compared to experiments with other example sampling methods, our method reduced both the overhead for supervision and the overhead for search, without the degeneration of the performance of the system

Journal ArticleDOI
TL;DR: A classifier whose accuracy may be sufficient for word sense disambiguation and an extension of the training method that uses information extracted from unlabeled examples to improve classification accuracy when there are few labeled training examples available.
Abstract: A word sense disambiguator that is able to distinguish among the many senses of common words that are found in general-purpose, broad-coverage lexicons would be useful. For example, experiments have shown that, given accurate sense disambiguation, the lexical relations encoded in lexicons such as WordNet can be exploited to improve the effectiveness of information retrieval systems. This paper describes a classifier whose accuracy may be sufficient for such a purpose. The classifier combines the output of a neural network that learns topical context with the output of a network that learns local context to distinguish among the senses of highly ambiguous words.The accuracy of the classifier is tested on three words, the noun line, the verb serve, and the adjective hard; the classifier has an average accuracy of 87%, 90%, and 81%, respectively, when forced to choose a sense for all test cases. When the classifier is not forced to choose a sense and is trained on a subset of the available senses, it rejects test cases containing unknown senses as well as test cases it would misclassify if forced to select a sense. Finally, when there are few labeled training examples available, we describe an extension of our training method that uses information extracted from unlabeled examples to improve classification accuracy.

Journal ArticleDOI
TL;DR: A heuristic approach capable of automatically clustering senses in a machine-readable dictionary (MRD) and an implementation of the method for clustering definition sentences in the Longman Dictionary of Contemporary English (LDOCE) is described.
Abstract: This paper describes a heuristic approach capable of automatically clustering senses in a machine-readable dictionary (MRD). Including these clusters in the MRD-based lexical database offers several positive benefits for word sense disambiguation (WSD). First, the clusters can be used as a coarser sense division, so unnecessarily fine sense distinction can be avoided. The clustered entries in the MRD can also be used as materials for supervised training to develop a WSD system. Furthermore, if the algorithm is run on several MRDs, the clusters also provide a means of linking different senses across multiple MRDs to create an integrated lexical database. An implementation of the method for clustering definition sentences in the Longman Dictionary of Contemporary English (LDOCE) is described. To this end, the topical word lists and topical cross-references in the Longman Lexicon of Contemporary English (LLOCE) are used. Nearly half of the senses in the LDOCE can be linked precisely to a relevant LLOCE topic using a simple heuristic. With the definitions of senses linked to the same topic viewed as a document, topical clustering of the MRD senses bears a striking resemblance to retrieval of relevant documents for a given query in information retrieval (IR) research. Relatively well-established IR techniques of weighting terms and ranking document relevancy are applied to find the topical clusters that are most relevant to the definition of each MRD sense. Finally, we describe an implemented version of the algorithms for the LDOCE and the LLOCE and assess the performance of the proposed approach in a series of experiments and evaluations.

Journal Article
TL;DR: The paper discusses some classes of contextual grammars---mainly those with "maximal use of selectors"---giving some arguments that these Grammars can be considered a good model for natural language syntax, and some ideas for associating a structure to the generated words, in the form of a tree, or of a dependence relation.
Abstract: The paper discusses some classes of contextual grammars---mainly those with "maximal use of selectors"---giving some arguments that these grammars can be considered a good model for natural language syntax.A contextual grammar produces a language starting from a finite set of words and interatively adding contexts to the currently generated words, according to a selection procedure: each context has associated with it a selector, a set of words; the context is adjoined to any occurrence of such a selector in the word to be derived. In grammars with maximal use of selectors, a context is adjoined only to selectros for which no superword is a selector. Maximality can be defined either locally or globally (with respect to all selectors in the grammar). The obtained families of languages are incomparable with that of Chomsky context-free languages (and with other families of languages that contain linear languages and that are not "too large"; see Section 5) and have a series of properties supporting the assertion that these grammars are a possible adequate model for the syntax of natural languages. They are able to straightforwardly describe all the usual restrictions appearing in natural (and artificial) languages, which lead to the non-context-freeness of these languages: reduplication, crossed dependencies, and multiple agreements; however, there are center-embedded constructions that cannot be covered by these grammars.While these assertions concern only the weak generative capacity of contextual grammars, some ideas are also proposed for associating a structure to the generated words, in the form of a tree, or of a dependence relation (as considered in descrpitive linguistics and also similar to that in link grammars).

Journal ArticleDOI
TL;DR: A study of the use of definite descriptions in written texts aimed at assessing the feasibility of annotating corpora with information about definite description interpretings finds that the former is more likely than the latter.
Abstract: We present the results of a study of the use of definite descriptions in written texts aimed at assessing the feasibility of annotating corpora with information about definite description interpret...

Journal ArticleDOI
TL;DR: This paper presents context-group discrimination, a disambiguation algorithm based on clustering, which indicates that senses are interpreted as groups of similar contexts of the ambiguous word.
Abstract: This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, co...


Journal ArticleDOI
TL;DR: This research presents a meta-modelling architecture that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and presenting graphical presentations.
Abstract: Graphical presentations can be used to communicate information in relational data sets succinctly and effectively. However, novel graphical presentations that represent many attributes and relation...


Journal ArticleDOI
TL;DR: The kinds of tree representations used in a treebank corpus can have a dramatic effect on performance of a parser based on the PCFG estimated from that corpus, causing the estimated likelihood of a parsers based on that corpus to change.
Abstract: The kinds of tree representations used in a treebank corpus can have a dramatic effect on performance of a parser based on the PCFG estimated from that corpus, causing the estimated likelihood of a...

Journal ArticleDOI
TL;DR: In this paper, an example sampling method for example-based word sense disambiguation systems is proposed, which can be used to construct a database of practical size, a considerable overhead for manual sense dis...
Abstract: This paper proposes an efficient example sampling method for example-based word sense disambiguation systems. To construct a database of practical size, a considerable overhead for manual sense dis...


Journal Article
TL;DR: In linguistics it has not been possible to use the standard criteria and assumptions of science because the ancients placed linguistics not in the physical domain but in the logical domain where concepts and theories do not represent parts of the natural world.
Abstract: In linguistics it has not been possible to use the standard criteria and assumptions of science because the ancients placed our discipline not in the physical domain but in the logical domain where concepts and theories do not represent parts of the natural world. Many of the problems facing linguistics follow inevitablly, for example the difficulties that linguistics experiences in agreeing on grammatical theory. One symptom is the long-standing dificulty in testing the depth hypothesis, which came out of early MT research. Sampson (1997) attempted recently to test the depth hypothesis by a computer analysis of a grammatically annotated corpus of English. It is shown that this attempted test and his attempt at defending the testability of the depth hypothesis are invalid. But clues from the depth hypothesis have led to new foundations for general linguistics put forth in the book (Yngve 1996) that Sampson (1998) reviewed. This work reconstitutes linguistics in the physical domain where the criteria and assumptions of science can be applied. Sampson's review of this book contains a number of serious errors and inaccuracies.



Journal ArticleDOI
TL;DR: The assignment of probabilities to the productions of a context-free grammar may generate an improper distribution: the probability of all finite parse trees is less than one as mentioned in this paper, and the condition for pro...
Abstract: The assignment of probabilities to the productions of a context-free grammar may generate an improper distribution: the probability of all finite parse trees is less than one. The condition for pro...