scispace - formally typeset
Search or ask a question

Showing papers in "Computational Linguistics in 2004"


Journal ArticleDOI
TL;DR: A phrase-based statistical machine translation approach the alignment template approach is described, which allows for general many-to-many relations between words and is easier to extend than classical statistical machinetranslation systems.
Abstract: A phrase-based statistical machine translation approach — the alignment template approach — is described. This translation approach allows for general many-to-many relations between words. Thereby, the context of words is taken into account in the translation model, and local changes in word order from source to target language can be learned explicitly. The model is described using a log-linear modeling approach, which is a generalization of the often used source–channel approach. Thereby, the model is easier to extend than classical statistical machine translation systems. We describe in detail the process for learning phrasal translations, the feature functions used, and the search algorithm. The evaluation of this approach is performed on three different tasks. For the German–English speech VERBMOBIL task, we analyze the effect of various system components. On the French–English Canadian HANSARDS task, the alignment template system obtains significantly better results than a single-word-based translation model. In the Chinese–English 2002 National Institute of Standards and Technology (NIST) machine translation evaluation it yields statistically significantly better NIST scores than all competing research and commercial translation systems.

1,031 citations


Journal ArticleDOI
TL;DR: This article shows that the density of subjectivity clues in the surrounding context strongly affects how likely it is that a word is subjective, and it provides the results of an annotation study assessing the subjectivity of sentences with high-density features.
Abstract: Subjectivity in natural language refers to aspects of language used to express opinions, evaluations, and speculations. There are numerous natural language processing applications for which subjectivity analysis is relevant, including information extraction and text categorization. The goal of this work is learning subjective language from corpora. Clues of subjectivity are generated and tested, including low-frequency words, collocations, and adjectives and verbs identified using distributional similarity. The features are also examined working together in concert. The features, generated from different data sets using different procedures, exhibit consistency in performance in that they all do better and worse on the same data sets. In addition, this article shows that the density of subjectivity clues in the surrounding context strongly affects how likely it is that a word is subjective, and it provides the results of an annotation study assessing the subjectivity of sentences with high-density features. Finally, the clues are used to perform opinion piece recognition (a type of text categorization and genre detection) to demonstrate the utility of the knowledge acquired in this article.

734 citations


Journal ArticleDOI
TL;DR: A method and a tool aimed at the extraction of domain ontologies from Web sites, and more generally from documents shared among the members of virtual organizations, based on a new word sense disambiguation algorithm, called structural semantic interconnections is presented.
Abstract: We present a method and a tool, OntoLearn, aimed at the extraction of domain ontologies from Web sites, and more generally from documents shared among the members of virtual organizations. OntoLearn first extracts a domain terminology from available documents. Then, complex domain terms are semantically interpreted and arranged in a hierarchical fashion. Finally, a general-purpose ontology, WordNet, is trimmed and enriched with the detected domain concepts. The major novel aspect of this approach is semantic interpretation, that is, the association of a complex concept with a complex term . This involves finding the appropriate WordNet concept for each word of a terminological string and the appropriate conceptual relations that hold among the concept components. Semantic interpretation is based on a new word sense disambiguation algorithm, called structural semantic interconnections.

442 citations


Journal ArticleDOI
TL;DR: In this article, the authors discuss the assumptions underlying different computations of the expected agreement component of the kappa coefficient of agreement and discuss how prevalence and bias affect the κ measure.
Abstract: In recent years, the kappa coefficient of agreement has become the de facto standard for evaluating intercoder agreement for tagging tasks. In this squib, we highlight issues that affect κ and that the community has largely neglected. First, we discuss the assumptions underlying different computations of the expected agreement component of κ. Second, we discuss how prevalence and bias affect the κ measure.

440 citations


Journal ArticleDOI
TL;DR: A large set of heretofore unpublished details Collins used in his parser are documents, such that, along with Collins' (1999) thesis, this article contains all information necessary to duplicate Collins' benchmark results.
Abstract: This article documents a large set of heretofore unpublished details Collins used in his parser, such that, along with Collins' (1999) thesis, this article contains all information necessary to duplicate Collins' benchmark results Indeed, these as-yet-unpublished details account for an 11% relative increase in error from an implementation including all details to a clean-room implementation of Collins' model We also show a cleaner and equally well-performing method for the handling of punctuation and conjunction and reveal certain other probabilistic oddities about Collins' parser We not only analyze the effect of the unpublished details, but also reanalyze the effect of certain well-known details, revealing that bilexical dependencies are barely used by the model and that head choice is not nearly as important to overall parsing performance as once thought Finally, we perform experiments that show that the true discriminative power of lexicalization appears to lie in the fact that unlexicalized syntactic structures are generated conditioning on the headword and its part of speech

277 citations


Journal ArticleDOI
TL;DR: CorMet is a corpus-based system for discovering metaphorical mappings between concepts by finding systematic variations in domain-specific selectional preferences, which are inferred from large, dynamically mined Internet corpora.
Abstract: CorMet is a corpus-based system for discovering metaphorical mappings between concepts. It does this by finding systematic variations in domain-specific selectional preferences, which are inferred from large, dynamically mined Internet corpora.Metaphors transfer structure from a source domain to a target domain, making some concepts in the target domain metaphorically equivalent to concepts in the source domain. The verbs that select for a concept in the source domain tend to select for its metaphorical equivalent in the target domain. This regularity, detectable with a shallow linguistic analysis, is used to find the metaphorical interconcept mappings, which can then be used to infer the existence of higher-level conventional metaphors.Most other computational metaphor systems use small, hand-coded semantic knowledge bases and work on a few examples. Although CorMet's only knowledge base is WordNet (Fellbaum 1998) it can find the mappings constituting many conventional metaphors and in some cases recognize sentences instantiating those mappings. CorMet is tested on its ability to find a subset of the Master Metaphor List (Lakoff, Espenson, and Schwartz 1991).

234 citations


Journal ArticleDOI
TL;DR: The results suggest that entity coherencecontinuous reference to the same entities must be supplemented at least by an account of relational coherence.
Abstract: Centering theory is the best-known framework for theorizing about local coherence and salience; however, its claims are articulated in terms of notions which are only partially specified, such as "utterance," "realization," or "ranking." A great deal of research has attempted to arrive at more detailed specifications of these parameters of the theory; as a result, the claims of centering can be instantiated in many different ways. We investigated in a systematic fashion the effect on the theory's claims of these different ways of setting the parameters. Doing this required, first of all, clarifying what the theory's claims are (one of our conclusions being that what has become known as "Constraint 1" is actually a central claim of the theory). Secondly, we had to clearly identify these parametric aspects: For example, we argue that the notion of "pronoun" used in Rule 1 should be considered a parameter. Thirdly, we had to find appropriate methods for evaluating these claims. We found that while the theory's main claim about salience and pronominalization, Rule 1—a preference for pronominalizing the backward-looking center (CB)—is verified with most instantiations, Constraint 1–a claim about (entity) coherence and CB uniqueness—is much more instantiation-dependent: It is not verified if the parameters are instantiated according to very mainstream views ("vanilla instantiation"), it holds only if indirect realization is allowed, and is violated by between 20% and 25% of utterances in our corpus even with the most favorable instantiations. We also found a trade-off between Rule 1, on the one hand, and Constraint 1 and Rule 2, on the other: Setting the parameters to minimize the violations of local coherence leads to increased violations of salience, and vice versa. Our results suggest that "entity" coherence—continuous reference to the same entities—must be supplemented at least by an account of relational coherence.

215 citations


Journal ArticleDOI
TL;DR: The construction of hierarchical lexicon models on the basis of equivalence classes of words are proposed and sentence-level restructuring transformations which aim at the assimilation of word order in related sentences are introduced.
Abstract: In statistical machine translation, correspondences between the words in the source and the target language are learned from parallel corpora, and often little or no linguistic knowledge is used to structure the underlying models. In particular, existing statistical systems for machine translation often treat different inflected forms of the same lemma as if they were independent of one another. The bilingual training data can be better exploited by explicitly taking into account the interdependencies of related inflected forms. We propose the construction of hierarchical lexicon models on the basis of equivalence classes of words. In addition, we introduce sentence-level restructuring transformations which aim at the assimilation of word order in related sentences. We have systematically investigated the amount of bilingual training data required to maintain an acceptable quality of machine translation. The combination of the suggested methods for improving translation quality in frameworks with scarce resources has been successfully tested: We were able to reduce the amount of bilingual training data to less than 10p of the original corpus, while losing only 1.6p in translation quality. The improvement of the translation results is demonstrated on two German-English corpora taken from the Verbmobil task and the Nespole! task.

206 citations


Journal ArticleDOI
TL;DR: This article uses statistical alignment methods to produce a set of conventional strings from which a stochastic rational grammar (e.g., an n-gram) is inferred, which is finally converted into a finite-state transducer.
Abstract: Finite-state transducers are models that are being used in different areas of pattern recognition and computational linguistics. One of these areas is machine translation, in which the approaches that are based on building models automatically from training examples are becoming more and more attractive. Finite-state transducers are very adequate for use in constrained tasks in which training samples of pairs of sentences are available. A technique for inferring finite-state transducers is proposed in this article. This technique is based on formal relations between finite-state transducers and rational grammars. Given a training corpus of source-target pairs of sentences, the proposed approach uses statistical alignment methods to produce a set of conventional strings from which a stochastic rational grammar (e.g., an n -gram) is inferred. This grammar is finally converted into a finite-state transducer. The proposed methods are assessed through a series of machine translation experiments within the framework of the EuTrans project.

173 citations


Journal ArticleDOI
TL;DR: Run-time Approaches to EBMT and Translating with Examples: The LFG-DOT Models of Translation.
Abstract: I Foundations of EBMT.- 1 An Overview of EBMT.- 2 What is Example-Based Machine Translation?.- 3 Example-Based Machine Translation in a Controlled Environment.- 4 EBMT Seen as Case-based Reasoning.- II Run-time Approaches to EBMT.- 5 Formalizing Translation Memory.- 6 EBMT Using DP-Matching Between Word Sequences.- 7 A Hybrid Rule and Example-Based Method for Machine Translation.- 8 EBMT of POS-Tagged Sentences via Inductive Learning.- III Template-Driven EBMT.- 9 Learning Translation Templates from Bilingual Translation Examples.- 10 Clustered Transfer Rule Induction for Example-Based Translation.- 11 Translation Patterns, Linguistic Knowledge and Complexity in EBMT.- 12 Inducing Translation Grammars from Bracketed Alignments.- IV EBMT and Derivation Trees.- 13 Extracting Translation Knowledge from Parallel Corpora.- 14 Finding Translation Patterns from Dependency Structures.- 15 A Best-First Alignment Algorithm for Extraction of Transfer Mappings.- 16 Translating with Examples: The LFG-DOT Models of Translation.

152 citations


Journal ArticleDOI
TL;DR: It is found that sample selection can significantly reduce the size of annotated training corpora and that uncertainty is a robust predictive criterion that can be easily applied to different learning models.
Abstract: Corpus-based statistical parsing relies on using large quantities of annotated text as training examples. Building this kind of resource is expensive and labor-intensive. This work proposes to use sample selection to find helpful training examples and reduce human effort spent on annotating less informative ones. We consider several criteria for predicting whether unlabeled data might be a helpful training example. Experiments are performed across two syntactic learning tasks and within the single task of parsing across two learning models to compare the effect of different predictive criteria. We find that sample selection can significantly reduce the size of annotated training corpora and that uncertainty is a robust predictive criterion that can be easily applied to different learning models.

Journal ArticleDOI
TL;DR: A number of variants of the Yarowsky algorithm (though not the original algorithm itself) are shown to optimize either likelihood or a closely related objective function K.
Abstract: Many problems in computational linguistics are well suited for bootstrapping (semisupervised learning) techniques. The Yarowsky algorithm is a well-known bootstrapping algorithm, but it is not mathematically well understood. This article analyzes it as optimizing an objective function. More specifically, a number of variants of the Yarowsky algorithm (though not the original algorithm itself) are shown to optimize either likelihood or a closely related objective function K.

Journal ArticleDOI
TL;DR: This work defines a word to be a meaningful string composed of several Chinese characters, considers the number of distinct predecessors and successors of a string in a large corpus (TREC 5 and TREC 6 documents), and uses them as the measurement of the context independency of astring from the rest of the sentences in the document.
Abstract: We are interested in the problem of word extraction from Chinese text collections. We define a word to be a meaningful string composed of several Chinese characters. For example, 'percent', and, 'more and more', are not recognized as traditional Chinese words from the viewpoint of some people. However, in our work, they are words because they are very widely used and have specific meanings. We start with the viewpoint that a word is a distinguished linguistic entity that can be used in many different language environments. We consider the characters that are directly before a string (predecessors) and the characters that are directly after a string (successors) as important factors for determining the independence of the string. We call such characters accessors of the string, consider the number of distinct predecessors and successors of a string in a large corpus (TREC 5 and TREC 6 documents), and use them as the measurement of the context independency of a string from the rest of the sentences in the document. Our experiments confirm our hypothesis and show that this simple rule gives quite good results for Chinese word extraction and is comparable to, and for long words outperforms, other iterative methods.

Journal ArticleDOI
TL;DR: This chapter discusses the development of a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory, as well as comparing several Aspects of Human-Computer and Human-Human Dialogues.
Abstract: Preface.- Acknowledgements.- Annotations and Tools for an Activity Based Spoken Language Corpus.- Using Direct Variant Transduction for Rapid Development of Natural Spoken Interfaces.- An Interface for Annotating Natural Interactivity.- Managing Communicative Intentions with Collaborative Problem Solving.- Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory.- An Empirical Study of Speech Recognition Errors in Human Computer Dialogue.- Comparing Several Aspects of Human-Computer and Human-Human Dialogues.- Full Paraphrase Generation for Fragments in Dialogue.- Disentangling Public from non-public Meaning.- Adaptivity and Response Generation in a Spoken Dialogue System.- On the Means for Clarification in Dialogue.- Plug and Play Spoken Dialogue Processing.- Conversational Implicatures and Communication Theory.- Reconciling Control and Discourse Structure.- The Information State Approach to Dialogue Management.- Visualizing Spoken Discourse.- References.- Appendix.

Journal ArticleDOI
Hang Li1, Cong Li1
TL;DR: Experimental results indicate that word translation disambiguation based on bilingual bootstrapping consistently and significantly outperforms existing methods that are based on monolingualbootstrapping.
Abstract: This article proposes a new method for word translation disambiguation, one that uses a machine-learning technique called bilingual bootstrapping. In learning to disambiguate words to be translated, bilingual bootstrapping makes use of a small amount of classified data and a large amount of unclassified data in both the source and the target languages. It repeatedly constructs classifiers in the two languages in parallel and boosts the performance of the classifiers by classifying unclassified data in the two languages and by exchanging information regarding classified data between the two languages. Experimental results indicate that word translation disambiguation based on bilingual bootstrapping consistently and significantly outperforms existing methods that are based on monolingual bootstrapping.

Journal ArticleDOI
TL;DR: This article describes methods for efficiently selecting a natural set of candidates for correcting an erroneous input P, the set of all words in the dictionary for which the Levenshtein distance to P does not exceed a given (small) bound k.
Abstract: The need to correct garbled strings arises in many areas of natural language processing. If a dictionary is available that covers all possible input tokens, a natural set of candidates for correcting an erroneous input P is the set of all words in the dictionary for which the Levenshtein distance to P does not exceed a given (small) bound k. In this article we describe methods for efficiently selecting such candidate sets. After introducing as a starting point a basic correction method based on the concept of a "universal Levenshtein automaton," we show how two filtering methods known from the field of approximate text search can be used to improve the basic procedure in a significant way. The first method, which uses standard dictionaries plus dictionaries with reversed words, leads to very short correction times for most classes of input strings. Our evaluation results demonstrate that correction times for fixed-distance bounds depend on the expected number of correction candidates, which decreases for longer input words. Similarly the choice of an optimal filtering method depends on the length of the input words.

Journal ArticleDOI
TL;DR: It is argued that text and sentence planning need to be driven in part by the goal of maintaining referential continuity and thereby facilitating pronoun resolution: Obtaining a favorable ordering of clauses, and of arguments within clauses, is likely to increase opportunities for nonambiguous pronoun use.
Abstract: This article describes an implemented system which uses centering theory for planning of coherent texts and choice of referring expressions. We argue that text and sentence planning need to be driven in part by the goal of maintaining referential continuity and thereby facilitating pronoun resolution: Obtaining a favorable ordering of clauses, and of arguments within clauses, is likely to increase opportunities for nonambiguous pronoun use. Centering theory provides the basis for such an integrated approach. Generating coherent texts according to centering theory is treated as a constraint satisfaction problem. The well-known Rule 2 of centering theory is reformulated in terms of a set of constraints—cohesion, salience, cheapness, and continuity—and we show sample outputs obtained under a particular weighting of these constraints. This framework facilitates detailed research into evaluation metrics and will therefore provide a productive research tool in addition to the immediate practical benefit of improving the fluency and readability of generated texts. The technique is generally applicable to natural language generation systems, which perform hierarchical text structuring based on a theory of coherence relations with certain additional assumptions.

Journal ArticleDOI
TL;DR: Using Levin's inventory to a simple statistical model of verb class ambiguity, this model is able to generate preferences for ambiguous verbs without the use of a disambiguated corpus and it is shown that these preferences are useful as priors for a verb sensedisambiguator.
Abstract: Levin's (1993) study of verb classes is a widely used resource for lexical semantics. In her framework, some verbs, such as give, exhibit no class ambiguity. But other verbs, such as write, have several alternative classes. We extend Levin's inventory to a simple statistical model of verb class ambiguity. Using this model we are able to generate preferences for ambiguous verbs without the use of a disambiguated corpus. We additionally show that these preferences are useful as priors for a verb sense disambiguator.


Journal ArticleDOI
TL;DR: Lexical cohesion is proposed as a well-defined notion to replace the intuitions captured by the use of inferable centers in this corpus of Japanese e-mail and two new transitions, based on lexical relatedness instead of identity, supplement the standard definitions and more adequately characterize coherence in this Corpus.
Abstract: A centering analysis of the corpus of Japanese e-mail that is examined in this article relies heavily on the inclusion of inferable centers. However, utilizing this type of center results in a high level of indeterminacy in labeling transitions and thus in characterizing the coherence of the corpus. The difficulty lies in the requirement of identity of discourse entities in the definitions of transition states. Lexical cohesion is proposed as a well-defined notion to replace the intuitions captured by the use of inferable centers. Two new transitions, based on lexical relatedness instead of identity, supplement the standard definitions and more adequately characterize coherence in this corpus. Implications and extensions of the proposal are discussed.


Journal ArticleDOI
TL;DR: In a recent article, Carrasco and Forcada as discussed by the authors presented two algorithms: one for incremental addition of strings to the language of a minimal, deterministic, cyclic automaton, and one for incrementally removing strings from the automaton.
Abstract: In a recent article, Carrasco and Forcada (June 2002) presented two algorithms: one for incremental addition of strings to the language of a minimal, deterministic, cyclic automaton, and one for incremental removal of strings from the automaton. The first algorithm is a generalization of the "algorithm for unsorted data" - the second of the two incremental algorithms for construction of minimal, deterministic, acyclic automata presented in Daciuk et al. (2000). We show that the other algorithm in the older article - the "algorithm for sorted data" - can be generalized in a similar way. The new algorithm is faster than the algorithm for addition of strings presented in Carrasco and Forcada's article, as it handles each state only once.