scispace - formally typeset
Search or ask a question

Showing papers in "Computational Linguistics in 2003"


Journal ArticleDOI
TL;DR: An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models.
Abstract: We present and compare various methods for computing word alignments using statistical or heuristic models. We consider the five alignment models presented in Brown, Della Pietra, Della Pietra, and Mercer (1993), the hidden Markov alignment model, smoothing techniques, and refinements. These statistical models are compared with two heuristic models based on the Dice coefficient. We present different methods for combining word alignments to perform a symmetrization of directed statistical alignment models. As evaluation criterion, we use the quality of the resulting Viterbi alignment compared to a manually produced reference alignment. We evaluate the models on the German-English Verbmobil task and the French-English Hansards task. We perform a detailed analysis of various design decisions of our statistical alignment system and evaluate these on training corpora of various sizes. An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models. In the Appendix, we present an efficient training algorithm for the alignment models presented.

4,402 citations


Journal ArticleDOI
TL;DR: Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.
Abstract: This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street Journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models.

1,956 citations


Journal ArticleDOI
TL;DR: This special issue of Computational Linguistics explores ways in which this dream of freely available language data in vast quantity and freely available is being explored.
Abstract: The Web, teeming as it is with language data, of all manner of varieties and languages, in vast quantity and freely available, is a fabulous linguists' playground. This special issue of Computational Linguistics explores ways in which this dream is being explored.

820 citations


Journal ArticleDOI
TL;DR: The use of supervised learning based on structural features of documents to improve classification performance, a new content-based measure of translational equivalence, and adaptation of the system to take advantage of the Internet Archive for mining parallel text from the Web on a large scale are presented.
Abstract: Parallel corpora have become an essential resource for work in multilingual natural language processing. In this article, we report on our work using the STRAND system for mining parallel text on the World Wide Web,first reviewing the original algorithm and results and then presenting a set of significant enhancements. These enhancements include the use of supervised learning based on structural features of documents to improve classification performance, a new content-based measure of translational equivalence, and adaptation of the system to take advantage of the Internet Archive for mining parallel text from the Web on a large scale. Finally, the value of these techniques is demonstrated in the construction of a significant parallel corpus for a low-density language pair.

679 citations


Journal ArticleDOI
TL;DR: It is shown that the Web can be employed to obtain frequencies for bigrams that are unseen in a given corpus by querying a search engine.
Abstract: This article shows that the Web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the Web by querying a search engine. We evaluate this method by demonstrating: (a) a high correlation between Web frequencies and corpus frequencies; (b) a reliable correlation between Web frequencies and plausibility judgments; (c) a reliable correlation between Web frequencies and frequencies recreated using class-based smoothing; (d) a good performance of Web frequencies in a pseudodisambiguation task.

371 citations


Journal ArticleDOI
TL;DR: A novel technique to restrict the possible word reorderings between source and target language in order to achieve an efficient search algorithm for statistical machine translation based on dynamic programming (DP).
Abstract: In this article, we describe an efficient beam search algorithm for statistical machine translation based on dynamic programming (DP). The search algorithm uses the translation model presented in Brown et al. (1993). Starting from a DP-based solution to the traveling-salesman problem, we present a novel technique to restrict the possible word reorderings between source and target language in order to achieve an efficient search algorithm. Word reordering restrictions especially useful for the translation direction German to English are presented. The restrictions are generalized, and a set of four parameters to control the word reordering is introduced, which then can easily be adopted to new translation directions. The beam search procedure has been successfully tested on the Verbmobil task (German to English, 8,000-word vocabulary) and on the Canadian Hansards task (French to English, 100,000-word vocabulary). For the medium-sized Verbmobil task, a sentence can be translated in a few seconds, only a small number of search errors occur, and there is no performance degradation as measured by the word error criterion used in this article.

293 citations


Journal ArticleDOI
TL;DR: A scene is proposed to formalize a scene as a labeled directed graph and describe content selection as a subgraph construction problem that allows for an integration of rule-based generation techniques with more recent stochastic approaches.
Abstract: This article describes a new approach to the generation of referring expressions. We propose to formalize a scene (consisting of a set of objects with various properties and relations) as a labeled directed graph and describe content selection (which properties to include in a referring expression) as a subgraph construction problem. Cost functions are used to guide the search process and to give preference to some solutions over others. The current approach has four main advantages: (1) Graph structures have been studied extensively, and by moving to a graph perspective we get direct access to the many theories and algorithms for dealing with graphs; (2) many existing generation algorithms can be reformulated in terms of graphs, and this enhances comparison and integration of the various approaches; (3) the graph perspective allows us to solve a number of problems that have plagued earlier algorithms for the generation of referring expressions; and (4) the combined use of graphs and cost functions paves the way for an integration of rule-based generation techniques with more recent stochastic approaches.

228 citations


Journal ArticleDOI
TL;DR: The authors argue that many common adverbial phrases generally taken to signal a discourse relation between syntactically connected units within discourse structure instead work anaphorically to contribute relational meaning, with only indirect dependence on discourse structure.
Abstract: We argue in this article that many common adverbial phrases generally taken to signal a discourse relation between syntactically connected units within discourse structure instead work anaphorically to contribute relational meaning, with only indirect dependence on discourse structure. This allows a simpler discourse structure to provide scaffolding for compositional semantics and reveals multiple ways in which the relational meaning conveyed by adverbial connectives can interact with that associated with discourse structure. We conclude by sketching out a lexicalized grammar for discourse that facilitates discourse interpretation as a product of compositional rules, anaphor resolution, and inference.

214 citations


Journal ArticleDOI
TL;DR: Although the preferences perform well in comparison with other unsupervised WSD systems on the same corpus, the results show that for many applications, further knowledge sources would be required to achieve an adequate level of accuracy and coverage.
Abstract: Selectional preferences have been used by word sense disambiguation (WSD) systems as one source of disambiguating information. We evaluate WSD using selectional preferences acquired for English adjective-noun, subject, and direct object grammatical relationships with respect to a standard test corpus. The selectional preferences are specific to verb or adjective classes, rather than individual word forms, so they can be used to disambiguate the co-occurring adjectives and verbs, rather than just the nominal argument heads. We also investigate use of the one-sense-per-discourse heuristic to propagate a sense tag for a word to other occurrences of the same word within the current document in order to increase coverage. Although the preferences perform well in comparison with other unsupervised WSD systems on the same corpus, the results show that for many applications, further knowledge sources would be required to achieve an adequate level of accuracy and coverage. In addition to quantifying performance, we analyze the results to investigate the situations in which the selectional preferences achieve the best precision and in which the one-sense-per-discourse heuristic increases performance.

149 citations


Journal ArticleDOI
TL;DR: In this article, the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process was investigated, and the results showed that the Web-based translation models can surpass commercial MT systems in CLIR tasks.
Abstract: Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since current models for information retrieval (IR) are still based on a bag of words. The Web provides a vast resource for the automatic construction of parallel corpora that can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this article, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.

128 citations


Journal ArticleDOI
TL;DR: This article acquires the meanings of metonymic verbs and adjectives from a large corpus and proposes a probabilistic model that provides a ranking on the set of possible interpretations and identifies the interpretations automatically by exploiting the consistent correspondences between surface syntactic cues and meaning.
Abstract: In this article we investigate logical metonymy, that is, constructions in which the argument of a word in syntax appears to be different from that argument in logical form (e.g., enjoy the book means enjoy reading the book, and easy problem means a problem that is early to solve). The systematic variation in the interpretation of such constructions suggests a rich and complex theory of composition on the syntax/semantics interface. Linguistic accounts of logical metonymy typically fail to describe exhaustively all the possible interpretations, or they don't rank those interpretations in terms of their likelihood. In view of this, we acquire the meanings of metonymic verbs and adjectives from a large corpus and propose a probabilistic model that provides a ranking on the set of possible interpretations. We identify the interpretations automatically by exploiting the consistent correspondences between surface syntactic cues and meaning. We evaluate our results against paraphrase judgments elicited experimentally from humans and show that the model's ranking of meanings correlates reliably with human intuitions.

Journal ArticleDOI
TL;DR: Knuth's generalization of Dijkstra's algorithm for the shortest-path problem offers a general method to solve this problem and is modular in the sense that Knuth's algorithm is formulated independently from the weighted deduction system.
Abstract: We discuss weighted deductive parsing and consider the problem of finding the derivation with the lowest weight. We show that Knuth's generalization of Dijkstra's algorithm for the shortest-path problem offers a general method to solve this problem. Our approach is modular in the sense that Knuth's algorithm is formulated independently from the weighted deduction system.

Journal ArticleDOI
TL;DR: A spatial model for matching semantic values between two languages, French and English, based on semantic similarity links is described, which constructs a map that represents a word in the source language and projects the map values onto a space in the target language.
Abstract: This article describes a spatial model for matching semantic values between two languages, French and English. Based on semantic similarity links, the model constructs a map that represents a word in the source language. Then the algorithm projects the map values onto a space in the target language. The new space abides by the semantic similarity links specific to the second language. Then the two maps are projected onto the same plane in order to detect overlapping values. For instructional purposes, the different steps are presented here using a few examples. The entire set of results is available at the following address: http://dico.isc.cnrs.fr.

Journal ArticleDOI
TL;DR: A dependency parsing scheme using an extended finite-state approach that augments input representation with channels so that links representing syntactic dependency relations among words can be accommodated and iterates on the input a number of times to arrive at a fixed point.
Abstract: This article presents a dependency parsing scheme using an extended finite-state approach. The parser augments input representation with "channels" so that links representing syntactic dependency relations among words can be accommodated and iterates on the input a number of times to arrive at a fixed point. Intermediate configurations violating various constraints of projective dependency representations such as no crossing links and no independent items except sentential head are filtered via finite-state filters. We have applied the parser to dependency parsing of Turkish.

Journal ArticleDOI
TL;DR: Despite the perception that the documents available on the Web are of questionable quality, it is demonstrated in contrast that such resources are extremely useful in automatically postediting translation candidates proposed by the EBMT system.
Abstract: We have developed an example-based machine translation (EBMT) system that uses the World Wide Web for two different purposes: First, we populate the system's memory with translations gathered from rule-based MT systems located on the Web. The source strings input to these systems were extracted automatically from an extremely small subset of the rule types in the Penn-II Treebank. In subsequent stages, the 〈source, target〉 translation pairs obtained are automatically transformed into a series of resources that render the translation process more successful. Despite the fact that the output from on-line MT systems is often faulty, we demonstrate in a number of experiments that when used to seed the memories of an EBMT system, they can in fact prove useful in generating translations of high quality in a robust fashion. In addition, we demonstrate the relative gain of EBMT in comparison to on-line systems. Second, despite the perception that the documents available on the Web are of questionable quality, we demonstrate in contrast that such resources are extremely useful in automatically postediting translation candidates proposed by our system.

Journal ArticleDOI
TL;DR: An algorithm that combines lexical information (from WordNet 1.7) with Web directories (from the Open Directory Project) to associate word senses with such directories supports the hypothesis that Web directories are a rich source of lexical Information: cleaner, more reliable, and more structured than the full Web as a corpus.
Abstract: We describe an algorithm that combines lexical information (from WordNet 1.7) with Web directories (from the Open Directory Project) to associate word senses with such directories. Such associations can be used as rich characterizations to acquire sense-tagged corpora automatically, cluster topically related senses, and detect sense specializations. The algorithm is evaluated for the 29 nouns (147 senses) used in the Senseval 2 competition, obtaining 148 (word sense, Web directory) associations covering 88% of the domain-specific word senses in the test data with 86% accuracy. The richness of Web directories as sense characterizations is evaluated in a supervised word sense disambiguation task using the Senseval 2 test suite. The results indicate that, when the directory/word sense association is correct, the samples automatically acquired from the Web directories are nearly as valid for training as the original Senseval 2 training instances. The results support our hypothesis that Web directories are a rich source of lexical information: cleaner, more reliable, and more structured than the full Web as a corpus.

Journal ArticleDOI
TL;DR: Computational aspects of Van der Sandt's binding and accommodation theory (BAT) for presupposition projection and anaphora resolution are presented and discussed and innovative use of first-order theorem provers to carry out consistency checking of discourse representations is investigated.
Abstract: Computational aspects of Van der Sandt's binding and accommodation theory (BAT) for presupposition projection and anaphora resolution are presented and discussed in this article. BAT is reformulated to meet requirements for computational implementation, which include operations on discourse representation structures (renaming and merging), the representation of presuppositions (allowing for selective binding and determining free and bound variables), and a formulation of the acceptability constraints imposed by BAT. An efficient presupposition resolution algorithm is presented, and several further improvements such as preferences for binding and accommodation are discussed and integrated in this algorithm. Finally, innovative use of first-order theorem provers to carry out consistency checking of discourse representations is investigated.

Journal ArticleDOI
TL;DR: The GA model proposed for the study of tone systems uses a Pareto ranking method that is highly applicable for dealing with optimization problems having multiple criteria and perceptual contrast and markedness complexity are considered simultaneously.
Abstract: In this study, optimization models using genetic algorithms (GAs) are proposed to study the configuration of vowels and tone systems. As in previous explanatory models that have been used to study vowel systems, certain criteria, which are assumed to be the principles governing the structure of sound systems, are used to predict optimal vowels and tone systems. In most of the earlier studies only one criterion has been considered. When two criteria are considered, they are often combined into one scalar function. The GA model proposed for the study of tone systems uses a Pareto ranking method that is highly applicable for dealing with optimization problems having multiple criteria. For optimization of tone systems, perceptual contrast and markedness complexity are considered simultaneously. Although the consistency between the predicted systems and the observed systems is not as significant as those obtained for vowel systems, further investigation along this line is promising.


Journal ArticleDOI
TL;DR: The issue of determining scope preferences is focused on, which has largely been ignored in theoretical linguistics, and different models of the interaction between syntax and quantifier scope are explored.
Abstract: This article describes a corpus-based investigation of quantifier scope preferences. Following recent work on multimodular grammar frameworks in theoretical linguistics and a long history of combining multiple information sources in natural language processing, scope is treated as a distinct module of grammar from syntax. This module incorporates multiple sources of evidence regarding the most likely scope reading for a sentence and is entirely data-driven. The experiments discussed in this article evaluate the performance of our models in predicting the most likely scope reading for a particular sentence, using Penn Treebank data both with and without syntactic annotation. We wish to focus attention on the issue of determining scope preferences, which has largely been ignored in theoretical linguistics, and to explore different models of the interaction between syntax and quantifier scope.



Journal ArticleDOI
TL;DR: A theory for automatic learning of text categorization models that has been repeatedly shown to be very successful and is based on a rather rough linguistic generalization of a language-dependent task: topic text classification (TC).
Abstract: Those trying to make sense of the notion of textual content and semantics within the wild, wild world of information retrieval, categorization, and filtering have to deal often with an overwhelming sea of problems. The really strange story is that most of them (myself included) still believe that developing a linguistically principled approach to text categorization is an interesting research problem. This will also emerge in the discussion of the book that is the focus of this review. Learning to Classify Texts Using Support Vector Machines by Thorsten Joachims proposes a theory for automatic learning of text categorization models that has been repeatedly shown to be very successful. At the same time, the approach proposed is based on a rather rough linguistic generalization of (what apparently is) a language-dependent task: topic text classification (TC). The result is twofold: on the one hand, a learning theory, based on statistical learnability principles and results, that avoids the limitations of the strong empiricism typical of most text classification research; and on the other hand, the application of a naive linguistic model, the bag-of-words representation, to linguistic objects (i.e., the documents) that still achieves impressive accuracy.

Journal Article
TL;DR: This book describes an investigation of the subset of general language used indefinition sentences and the development of a taxonomy of definition types, a grammar of definition sentences and parsing software which can extract their functional components.
Abstract: Definition is a basic activity of language, of particular importance to linguists because of its use of language to describe itself. Beyond this inherent significance as a crucial element of language study, definitions also provide a rich potential source of the information needed for Natural Language Processing systems. This book describes an investigation of the subset of general language used in definition sentences and the development of a taxonomy of definition types, a grammar of definition sentences and parsing software which can extract their functional components. The work is based on definition sentences used in one of the dictionaries from the Cobuild range, and the book includes a brief history of the development of monolingual English dictionaries, an assessment of the concepts of sublanguages and local grammars and a full exploration of the results of the analysis and of the present and future applications of the taxonomy, grammar and parser.




Journal Article
TL;DR: A theory for automatic learning of text categorization models that has been repeatedly shown to be very successful and is based on a rather rough linguistic generalization of a language-dependent task: topic text classification (TC).
Abstract: Those trying to make sense of the notion of textual content and semantics within the wild, wild world of information retrieval, categorization, and filtering have to deal often with an overwhelming sea of problems. The really strange story is that most of them (myself included) still believe that developing a linguistically principled approach to text categorization is an interesting research problem. This will also emerge in the discussion of the book that is the focus of this review. Learning to Classify Texts Using Support Vector Machines by Thorsten Joachims proposes a theory for automatic learning of text categorization models that has been repeatedly shown to be very successful. At the same time, the approach proposed is based on a rather rough linguistic generalization of (what apparently is) a language-dependent task: topic text classification (TC). The result is twofold: on the one hand, a learning theory, based on statistical learnability principles and results, that avoids the limitations of the strong empiricism typical of most text classification research; and on the other hand, the application of a naive linguistic model, the bag-of-words representation, to linguistic objects (i.e., the documents) that still achieves impressive accuracy.