Showing papers in &quot;Computational Linguistics in 2003&quot;

Head-Driven Statistical Models for Natural Language Parsing

TL;DR: An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models.

...read moreread less

Abstract: We present and compare various methods for computing word alignments using statistical or heuristic models. We consider the five alignment models presented in Brown, Della Pietra, Della Pietra, and Mercer (1993), the hidden Markov alignment model, smoothing techniques, and refinements. These statistical models are compared with two heuristic models based on the Dice coefficient. We present different methods for combining word alignments to perform a symmetrization of directed statistical alignment models. As evaluation criterion, we use the quality of the resulting Viterbi alignment compared to a manually produced reference alignment. We evaluate the models on the German-English Verbmobil task and the French-English Hansards task. We perform a detailed analysis of various design decisions of our statistical alignment system and evaluate these on training corpora of various sizes. An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models. In the Appendix, we present an efficient training algorithm for the alignment models presented.

...read moreread less

4,402 citations

Journal Article•DOI•

[...]

Michael Collins¹•Institutions (1)

Massachusetts Institute of Technology¹

Introduction to the special issue on the web as corpus

TL;DR: Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.

...read moreread less

Abstract: This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street Journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models.

...read moreread less

1,956 citations

Journal Article•DOI•

[...]

Adam Kilgarriff¹, Gregory Grefenstette•Institutions (1)

University of Brighton¹

The Web as a parallel corpus

TL;DR: This special issue of Computational Linguistics explores ways in which this dream of freely available language data in vast quantity and freely available is being explored.

...read moreread less

Abstract: The Web, teeming as it is with language data, of all manner of varieties and languages, in vast quantity and freely available, is a fabulous linguists' playground. This special issue of Computational Linguistics explores ways in which this dream is being explored.

...read moreread less

820 citations

Journal Article•DOI•

[...]

Philip Resnik¹, Noah A. Smith²•Institutions (2)

University of Maryland, College Park¹, Johns Hopkins University²

Using the web to obtain frequencies for unseen bigrams

TL;DR: The use of supervised learning based on structural features of documents to improve classification performance, a new content-based measure of translational equivalence, and adaptation of the system to take advantage of the Internet Archive for mining parallel text from the Web on a large scale are presented.

...read moreread less

Abstract: Parallel corpora have become an essential resource for work in multilingual natural language processing. In this article, we report on our work using the STRAND system for mining parallel text on the World Wide Web,first reviewing the original algorithm and results and then presenting a set of significant enhancements. These enhancements include the use of supervised learning based on structural features of documents to improve classification performance, a new content-based measure of translational equivalence, and adaptation of the system to take advantage of the Internet Archive for mining parallel text from the Web on a large scale. Finally, the value of these techniques is demonstrated in the construction of a significant parallel corpus for a low-density language pair.

...read moreread less

679 citations

Journal Article•DOI•

[...]

Frank Keller¹, Mirella Lapata²•Institutions (2)

University of Edinburgh¹, University of Sheffield²

Word reordering and a dynamic programming beam search algorithm for statistical machine translation

TL;DR: It is shown that the Web can be employed to obtain frequencies for bigrams that are unseen in a given corpus by querying a search engine.

...read moreread less

Abstract: This article shows that the Web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the Web by querying a search engine. We evaluate this method by demonstrating: (a) a high correlation between Web frequencies and corpus frequencies; (b) a reliable correlation between Web frequencies and plausibility judgments; (c) a reliable correlation between Web frequencies and frequencies recreated using class-based smoothing; (d) a good performance of Web frequencies in a pseudodisambiguation task.

...read moreread less

371 citations

Journal Article•DOI•

[...]

Christoph Tillmann¹, Hermann Ney²•Institutions (2)

IBM¹, RWTH Aachen University²

Graph-based generation of referring expressions

TL;DR: A novel technique to restrict the possible word reorderings between source and target language in order to achieve an efficient search algorithm for statistical machine translation based on dynamic programming (DP).

...read moreread less

Abstract: In this article, we describe an efficient beam search algorithm for statistical machine translation based on dynamic programming (DP). The search algorithm uses the translation model presented in Brown et al. (1993). Starting from a DP-based solution to the traveling-salesman problem, we present a novel technique to restrict the possible word reorderings between source and target language in order to achieve an efficient search algorithm. Word reordering restrictions especially useful for the translation direction German to English are presented. The restrictions are generalized, and a set of four parameters to control the word reordering is introduced, which then can easily be adopted to new translation directions. The beam search procedure has been successfully tested on the Verbmobil task (German to English, 8,000-word vocabulary) and on the Canadian Hansards task (French to English, 100,000-word vocabulary). For the medium-sized Verbmobil task, a sentence can be translated in a few seconds, only a small number of search errors occur, and there is no performance degradation as measured by the word error criterion used in this article.

...read moreread less

293 citations

Journal Article•DOI•

[...]

Emiel Krahmer¹, Sebastiaan van Erk, André Verleg•Institutions (1)

Tilburg University¹

Anaphora and Discourse Structure

TL;DR: A scene is proposed to formalize a scene as a labeled directed graph and describe content selection as a subgraph construction problem that allows for an integration of rule-based generation techniques with more recent stochastic approaches.

...read moreread less

Abstract: This article describes a new approach to the generation of referring expressions. We propose to formalize a scene (consisting of a set of objects with various properties and relations) as a labeled directed graph and describe content selection (which properties to include in a referring expression) as a subgraph construction problem. Cost functions are used to guide the search process and to give preference to some solutions over others. The current approach has four main advantages: (1) Graph structures have been studied extensively, and by moving to a graph perspective we get direct access to the many theories and algorithms for dealing with graphs; (2) many existing generation algorithms can be reformulated in terms of graphs, and this enhances comparison and integration of the various approaches; (3) the graph perspective allows us to solve a number of problems that have plagued earlier algorithms for the generation of referring expressions; and (4) the combined use of graphs and cost functions paves the way for an integration of rule-based generation techniques with more recent stochastic approaches.

...read moreread less

228 citations

Journal Article•DOI•

[...]

Bonnie Webber, Matthew Stone¹, Aravind K. Joshi², Alistair Knott³•Institutions (3)

Rutgers University¹, University of Pennsylvania², University of Otago³

Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences

TL;DR: The authors argue that many common adverbial phrases generally taken to signal a discourse relation between syntactically connected units within discourse structure instead work anaphorically to contribute relational meaning, with only indirect dependence on discourse structure.

...read moreread less

Abstract: We argue in this article that many common adverbial phrases generally taken to signal a discourse relation between syntactically connected units within discourse structure instead work anaphorically to contribute relational meaning, with only indirect dependence on discourse structure. This allows a simpler discourse structure to provide scaffolding for compositional semantics and reveals multiple ways in which the relational meaning conveyed by adverbial connectives can interact with that associated with discourse structure. We conclude by sketching out a lexicalized grammar for discourse that facilitates discourse interpretation as a product of compositional rules, anaphor resolution, and inference.

...read moreread less

214 citations

Journal Article•DOI•

[...]

Diana McCarthy¹, John M. Carroll¹•Institutions (1)

University of Sussex¹

Embedding web-based statistical translation models in cross-language information retrieval

TL;DR: Although the preferences perform well in comparison with other unsupervised WSD systems on the same corpus, the results show that for many applications, further knowledge sources would be required to achieve an adequate level of accuracy and coverage.

...read moreread less

Abstract: Selectional preferences have been used by word sense disambiguation (WSD) systems as one source of disambiguating information. We evaluate WSD using selectional preferences acquired for English adjective-noun, subject, and direct object grammatical relationships with respect to a standard test corpus. The selectional preferences are specific to verb or adjective classes, rather than individual word forms, so they can be used to disambiguate the co-occurring adjectives and verbs, rather than just the nominal argument heads. We also investigate use of the one-sense-per-discourse heuristic to propagate a sense tag for a word to other occurrences of the same word within the current document in order to increase coverage. Although the preferences perform well in comparison with other unsupervised WSD systems on the same corpus, the results show that for many applications, further knowledge sources would be required to achieve an adequate level of accuracy and coverage. In addition to quantifying performance, we analyze the results to investigate the situations in which the selectional preferences achieve the best precision and in which the one-sense-per-discourse heuristic increases performance.

...read moreread less

149 citations

Journal Article•DOI•

[...]

Wessel Kraaij, Jian-Yun Nie¹, Michel Simard¹•Institutions (1)

Université de Montréal¹

A probabilistic account of logical metonymy

TL;DR: In this article, the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process was investigated, and the results showed that the Web-based translation models can surpass commercial MT systems in CLIR tasks.

...read moreread less

Abstract: Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since current models for information retrieval (IR) are still based on a bag of words. The Web provides a vast resource for the automatic construction of parallel corpora that can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this article, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.

...read moreread less

128 citations

Journal Article•DOI•

[...]

Maria Lapata¹, Alex Lascarides²•Institutions (2)

University of Sheffield¹, University of Edinburgh²

Weighted deductive parsing and Knuth's algorithm

TL;DR: This article acquires the meanings of metonymic verbs and adjectives from a large corpus and proposes a probabilistic model that provides a ranking on the set of possible interpretations and identifies the interpretations automatically by exploiting the consistent correspondences between surface syntactic cues and meaning.

...read moreread less

Abstract: In this article we investigate logical metonymy, that is, constructions in which the argument of a word in syntax appears to be different from that argument in logical form (e.g., enjoy the book means enjoy reading the book, and easy problem means a problem that is early to solve). The systematic variation in the interpretation of such constructions suggests a rich and complex theory of composition on the syntax/semantics interface. Linguistic accounts of logical metonymy typically fail to describe exhaustively all the possible interpretations, or they don't rank those interpretations in terms of their likelihood. In view of this, we acquire the meanings of metonymic verbs and adjectives from a large corpus and propose a probabilistic model that provides a ranking on the set of possible interpretations. We identify the interpretations automatically by exploiting the consistent correspondences between surface syntactic cues and meaning. We evaluate our results against paraphrase judgments elicited experimentally from humans and show that the model's ranking of meanings correlates reliably with human intuitions.

...read moreread less

Journal Article•DOI•

[...]

Mark-Jan Nederhof¹•Institutions (1)

University of Groningen¹

A model for matching semantic maps between languages (French/English, English/French)

TL;DR: Knuth's generalization of Dijkstra's algorithm for the shortest-path problem offers a general method to solve this problem and is modular in the sense that Knuth's algorithm is formulated independently from the weighted deduction system.

...read moreread less

Abstract: We discuss weighted deductive parsing and consider the problem of finding the derivation with the lowest weight. We show that Knuth's generalization of Dijkstra's algorithm for the shortest-path problem offers a general method to solve this problem. Our approach is modular in the sense that Knuth's algorithm is formulated independently from the weighted deduction system.

...read moreread less

Journal Article•DOI•

[...]

Sabine Ploux¹, Hyungsuk Ji¹•Institutions (1)

University of Lyon¹

Dependency Parsing with an Extended Finite-State Approach

TL;DR: A spatial model for matching semantic values between two languages, French and English, based on semantic similarity links is described, which constructs a map that represents a word in the source language and projects the map values onto a space in the target language.

...read moreread less

Abstract: This article describes a spatial model for matching semantic values between two languages, French and English. Based on semantic similarity links, the model constructs a map that represents a word in the source language. Then the algorithm projects the map values onto a space in the target language. The new space abides by the semantic similarity links specific to the second language. Then the two maps are projected onto the same plane in order to detect overlapping values. For instructional purposes, the different steps are presented here using a few examples. The entire set of results is available at the following address: http://dico.isc.cnrs.fr.

...read moreread less

Journal Article•DOI•

[...]

Kemal Oflazer¹•Institutions (1)

Sabancı University¹

wEBMT : developing and validating an example-based machine translation system using the world wide web

TL;DR: A dependency parsing scheme using an extended finite-state approach that augments input representation with channels so that links representing syntactic dependency relations among words can be accommodated and iterates on the input a number of times to arrive at a fixed point.

...read moreread less

Abstract: This article presents a dependency parsing scheme using an extended finite-state approach. The parser augments input representation with "channels" so that links representing syntactic dependency relations among words can be accommodated and iterates on the input a number of times to arrive at a fixed point. Intermediate configurations violating various constraints of projective dependency representations such as no crossing links and no independent items except sentential head are filtered via finite-state filters. We have applied the parser to dependency parsing of Turkish.

...read moreread less

Journal Article•DOI•

[...]

Andy Way¹, Nano Gough¹•Institutions (1)

Dublin City University¹

Automatic association of web directories with word senses

TL;DR: Despite the perception that the documents available on the Web are of questionable quality, it is demonstrated in contrast that such resources are extremely useful in automatically postediting translation candidates proposed by the EBMT system.

...read moreread less

Abstract: We have developed an example-based machine translation (EBMT) system that uses the World Wide Web for two different purposes: First, we populate the system's memory with translations gathered from rule-based MT systems located on the Web. The source strings input to these systems were extracted automatically from an extremely small subset of the rule types in the Penn-II Treebank. In subsequent stages, the 〈source, target〉 translation pairs obtained are automatically transformed into a series of resources that render the translation process more successful. Despite the fact that the output from on-line MT systems is often faulty, we demonstrate in a number of experiments that when used to seed the memories of an EBMT system, they can in fact prove useful in generating translations of high quality in a robust fashion. In addition, we demonstrate the relative gain of EBMT in comparison to on-line systems. Second, despite the perception that the documents available on the Web are of questionable quality, we demonstrate in contrast that such resources are extremely useful in automatically postediting translation candidates proposed by our system.

...read moreread less

Journal Article•DOI•

[...]

Celina Santamar'ia¹, Julio Gonzalo¹, Felisa Verdejo¹•Institutions (1)

National University of Distance Education¹

Implementing the binding and accommodation theory for anaphora resolution and presupposition projection

TL;DR: An algorithm that combines lexical information (from WordNet 1.7) with Web directories (from the Open Directory Project) to associate word senses with such directories supports the hypothesis that Web directories are a rich source of lexical Information: cleaner, more reliable, and more structured than the full Web as a corpus.

...read moreread less

Abstract: We describe an algorithm that combines lexical information (from WordNet 1.7) with Web directories (from the Open Directory Project) to associate word senses with such directories. Such associations can be used as rich characterizations to acquire sense-tagged corpora automatically, cluster topically related senses, and detect sense specializations. The algorithm is evaluated for the 29 nouns (147 senses) used in the Senseval 2 competition, obtaining 148 (word sense, Web directory) associations covering 88% of the domain-specific word senses in the test data with 86% accuracy. The richness of Web directories as sense characterizations is evaluated in a supervised word sense disambiguation task using the Senseval 2 test suite. The results indicate that, when the directory/word sense association is correct, the samples automatically acquired from the Web directories are nearly as valid for training as the original Senseval 2 training instances. The results support our hypothesis that Web directories are a rich source of lexical information: cleaner, more reliable, and more structured than the full Web as a corpus.

...read moreread less

Journal Article•DOI•

[...]

Johan Bos¹•Institutions (1)

University of Edinburgh¹

Optimization models of sound systems using genetic algorithms

TL;DR: Computational aspects of Van der Sandt's binding and accommodation theory (BAT) for presupposition projection and anaphora resolution are presented and discussed and innovative use of first-order theorem provers to carry out consistency checking of discourse representations is investigated.

...read moreread less

Abstract: Computational aspects of Van der Sandt's binding and accommodation theory (BAT) for presupposition projection and anaphora resolution are presented and discussed in this article. BAT is reformulated to meet requirements for computational implementation, which include operations on discourse representation structures (renaming and merging), the representation of presuppositions (allowing for selective binding and determining free and bound variables), and a formulation of the acceptability constraints imposed by BAT. An efficient presupposition resolution algorithm is presented, and several further improvements such as preferences for binding and accommodation are discussed and integrated in this algorithm. Finally, innovative use of first-order theorem provers to carry out consistency checking of discourse representations is investigated.

...read moreread less

Journal Article•DOI•

[...]

Jinyun Ke¹, Mieko Ogura², William S.-Y. Wang¹•Institutions (2)

City University of Hong Kong¹, Tsurumi University²

Une grammaire électronique du français

TL;DR: The GA model proposed for the study of tone systems uses a Pareto ranking method that is highly applicable for dealing with optimization problems having multiple criteria and perceptual contrast and markedness complexity are considered simultaneously.

...read moreread less

Abstract: In this study, optimization models using genetic algorithms (GAs) are proposed to study the configuration of vowels and tone systems. As in previous explanatory models that have been used to study vowel systems, certain criteria, which are assumed to be the principles governing the structure of sound systems, are used to predict optimal vowels and tone systems. In most of the earlier studies only one criterion has been considered. When two criteria are considered, they are often combined into one scalar function. The GA model proposed for the study of tone systems uses a Pareto ranking method that is highly applicable for dealing with optimization problems having multiple criteria. For optimization of tone systems, perceptual contrast and markedness complexity are considered simultaneously. Although the consistency between the predicted systems and the observed systems is not as significant as those obtained for vowel systems, further investigation along this line is promising.

...read moreread less

Journal Article•DOI•

[...]

Anne Abeillé¹•Institutions (1)

Paris Diderot University¹

A machine learning approach to modeling scope preferences

Journal Article•DOI•

[...]

Derrick Higgins¹, Jerrold M. Sadock¹•Institutions (1)

University of Chicago¹

Une grammaire électronique du français

TL;DR: The issue of determining scope preferences is focused on, which has largely been ignored in theoretical linguistics, and different models of the interaction between syntax and quantifier scope are explored.

...read moreread less

Abstract: This article describes a corpus-based investigation of quantifier scope preferences. Following recent work on multimodular grammar frameworks in theoretical linguistics and a long history of combining multiple information sources in natural language processing, scope is treated as a distinct module of grammar from syntax. This module incorporates multiple sources of evidence regarding the most likely scope reading for a sentence and is entirely data-driven. The experiments discussed in this article evaluate the performance of our models in predicting the most likely scope reading for a particular sentence, using Penn Treebank data both with and without syntactic annotation. We wish to focus attention on the issue of determining scope preferences, which has largely been ignored in theoretical linguistics, and to explore different models of the interaction between syntax and quantifier scope.

...read moreread less

Journal Article•DOI•

[...]

Jessie Pinkham

Review of Recent advances in computational terminology by Didier Bourigault, Christian Jacquemin, and Marie-Claude L'Homme. John Benjamins 2001

Journal Article•DOI•

[...]

Robert Gaizauskas¹•Institutions (1)

University of Sheffield¹

Book review: learning to classify text using support vector machines: methods, theory, and algorithms by thorsten joachims cornell university dordrecht: kluwer academic publishers, 2002, xvii+205 pp; hardbound, isbn 0-7923-7679-x, $110.00, €121.00, £77.00

Journal Article•DOI•

[...]

Roberto Basili¹•Institutions (1)

University of Rome Tor Vergata¹

Briefly noted: defining language: A local grammar of definition sentences

TL;DR: A theory for automatic learning of text categorization models that has been repeatedly shown to be very successful and is based on a rather rough linguistic generalization of a language-dependent task: topic text classification (TC).

...read moreread less

Abstract: Those trying to make sense of the notion of textual content and semantics within the wild, wild world of information retrieval, categorization, and filtering have to deal often with an overwhelming sea of problems. The really strange story is that most of them (myself included) still believe that developing a linguistically principled approach to text categorization is an interesting research problem. This will also emerge in the discussion of the book that is the focus of this review. Learning to Classify Texts Using Support Vector Machines by Thorsten Joachims proposes a theory for automatic learning of text categorization models that has been repeatedly shown to be very successful. At the same time, the approach proposed is based on a rather rough linguistic generalization of (what apparently is) a language-dependent task: topic text classification (TC). The result is twofold: on the one hand, a learning theory, based on statistical learnability principles and results, that avoids the limitations of the strong empiricism typical of most text classification research; and on the other hand, the application of a naive linguistic model, the bag-of-words representation, to linguistic objects (i.e., the documents) that still achieves impressive accuracy.

...read moreread less

Journal Article•

[...]

Geoff Barnbrook¹•Institutions (1)

University of Birmingham¹