scispace - formally typeset
Search or ask a question

Showing papers in "Computational Linguistics in 2009"


Journal ArticleDOI
TL;DR: The goal of this work is to automatically distinguish between prior and contextual polarity, with a focus on understanding which features are important for this task, and it is shown that the presence of neutral instances greatly degrades the performance of features for distinguishing between positive and negative polarity.
Abstract: Many approaches to automatic sentiment analysis begin with a large lexicon of words marked with their prior polarity (also called semantic orientation). However, the contextual polarity of the phrase in which a particular instance of a word appears may be quite different from the word's prior polarity. Positive words are used in phrases expressing negative sentiments, or vice versa. Also, quite often words that are positive or negative out of context are neutral in context, meaning they are not even being used to express a sentiment. The goal of this work is to automatically distinguish between prior and contextual polarity, with a focus on understanding which features are important for this task. Because an important aspect of the problem is identifying when polar terms are being used in neutral contexts, features for distinguishing between neutral and polar instances are evaluated, as well as features for distinguishing between positive and negative contextual polarity. The evaluation includes assessing the performance of features across multiple machine learning algorithms. For all learning algorithms except one, the combination of all features together gives the best performance. Another facet of the evaluation considers how the presence of neutral instances affects the performance of features for distinguishing between positive and negative polarity. These experiments show that the presence of neutral instances greatly degrades the performance of these features, and that perhaps the best way to improve performance across all polarity classes is to improve the system's ability to identify when an instance is neutral.

677 citations


Journal ArticleDOI
TL;DR: Alignment templates, phrase-based models, and stochastic finite-state transducers are used to develop computer-assisted translation systems in a European project in two real tasks.
Abstract: Current machine translation (MT) systems are still not perfect. In practice, the output from these systems needs to be edited to correct errors. A way of increasing the productivity of the whole translation process (MT plus human work) is to incorporate the human correction activities within the translation process itself, thereby shifting the MT paradigm to that of computer-assisted translation. This model entails an iterative process in which the human translator activity is included in the loop: In each iteration, a prefix of the translation is validated (accepted or amended) by the human and the system computes its best (or n-best) translation suffix hypothesis to complete this prefix. A successful framework for MT is the so-called statistical (or pattern recognition) framework. Interestingly, within this framework, the adaptation of MT systems to the interactive scenario affects mainly the search process, allowing a great reuse of successful techniques and models. In this article, alignment templates, phrase-based models, and stochastic finite-state transducers are used to develop computer-assisted translation systems. These systems were assessed in a European project (TransType2) in two real tasks: The translation of printer manuals; manuals and the translation of the Bulletin of the European Union. In each task, the following three pairs of languages were involved (in both translation directions): English-Spanish, English-German, and English-French.

238 citations


Journal ArticleDOI
TL;DR: The results of two studies of how well some metrics which are popular in other areas of NLP correlate with human judgments in the domain of computer-generated weather forecasts suggest that, at least in this domain, metrics may provide a useful measure of language quality, although the evidence for this is not as strong as one would ideally like to see.
Abstract: There is growing interest in using automatically computed corpus-based evaluation metrics to evaluate Natural Language Generation (NLG) systems, because these are often considerably cheaper than the human-based evaluations which have traditionally been used in NLG. We review previous work on NLG evaluation and on validation of automatic metrics in NLP, and then present the results of two studies of how well some metrics which are popular in other areas of NLP (notably BLEU and ROUGE) correlate with human judgments in the domain of computer-generated weather forecasts. Our results suggest that, at least in this domain, metrics may provide a useful measure of language quality, although the evidence for this is not as strong as we would ideally like to see; however, they do not provide a useful measure of content quality. We also discuss a number of caveats which must be kept in mind when interpreting this and other validation studies.

194 citations


Journal ArticleDOI
TL;DR: This article develops statistical measures that each model a specific property of idiomatic expressions by looking at their actual usage patterns in text, and uses some of the measures in a token identification task where they distinguish idiomatic and literal usages of potentially idiomatic expression in context.
Abstract: Idiomatic expressions are plentiful in everyday language, yet they remain mysterious, as it is not clear exactly how people learn and understand them. They are of special interest to linguists, psycholinguists, and lexicographers, mainly because of their syntactic and semantic idiosyncrasies as well as their unclear lexical status. Despite a great deal of research on the properties of idioms in the linguistics literature, there is not much agreement on which properties are characteristic of these expressions. Because of their peculiarities, idiomatic expressions have mostly been overlooked by researchers in computational linguistics. In this article, we look into the usefulness of some of the identified linguistic properties of idioms for their automatic recognition. Specifically, we develop statistical measures that each model a specific property of idiomatic expressions by looking at their actual usage patterns in text. We use these statistical measures in a type-based classification task where we automatically separate idiomatic expressions (expressions with a possible idiomatic interpretation) from similar-on-the-surface literal phrases (for which no idiomatic interpretation is possible). In addition, we use some of the measures in a token identification task where we distinguish idiomatic and literal usages of potentially idiomatic expressions in context.

188 citations


Journal ArticleDOI
TL;DR: A Chinese word segmentation model learned from punctuation marks which are perfect word delimiters is presented, which is considerably more effective than previous methods in unknown word recognition.
Abstract: We present a Chinese word segmentation model learned from punctuation marks which are perfect word delimiters. The learning is aided by a manually segmented corpus. Our method is considerably more effective than previous methods in unknown word recognition. This is a step toward addressing one of the toughest problems in Chinese word segmentation.

113 citations


Journal ArticleDOI
TL;DR: A generic architecture for a visually situated dialog system is described and the interactions between the spatial cognition module, which provides the interface to the models of prepositional semantics, and the other components in the architecture are highlighted.
Abstract: This article describes the application of computational models of spatial prepositions to visually situated dialog systems. In these dialogs, spatial prepositions are important because people often use them to refer to entities in the visual context of a dialog. We first describe a generic architecture for a visually situated dialog system and highlight the interactions between the spatial cognition module, which provides the interface to the models of prepositional semantics, and the other components in the architecture. Following this, we present two new computational models of topological and projective spatial prepositions. The main novelty within these models is the fact that they account for the contextual effect which other distractor objects in a visual scene can have on the region described by a given preposition. We next present psycholinguistic tests evaluating our approach to distractor interference on prepositional semantics, and illustrate how these models are used for both interpretation and generation of prepositional expressions.

91 citations


Journal ArticleDOI
TL;DR: A novel bootstrapping approach for improving the quality of feature vector weighting in distributional word similarity, motivated by attempts to utilize distributional similarity for identifying the concrete semantic relationship of lexical entailment.
Abstract: This article presents a novel bootstrapping approach for improving the quality of feature vector weighting in distributional word similarity. The method was motivated by attempts to utilize distributional similarity for identifying the concrete semantic relationship of lexical entailment. Our analysis revealed that a major reason for the rather loose semantic similarity obtained by distributional similarity methods is insufficient quality of the word feature vectors, caused by deficient feature weighting. This observation led to the definition of a bootstrapping scheme which yields improved feature weights, and hence higher quality feature vectors. The underlying idea of our approach is that features which are common to similar words are also most characteristic for their meanings, and thus should be promoted. This idea is realized via a bootstrapping step applied to an initial standard approximation of the similarity space. The superior performance of the bootstrapping method was assessed in two different experiments, one based on direct human gold-standard annotation and the other based on an automatically created disambiguation dataset. These results are further supported by applying a novel quantitative measurement of the quality of feature weighting functions. Improved feature weighting also allows massive feature reduction, which indicates that the most characteristic features for a word are indeed concentrated at the top ranks of its vector. Finally, experiments with three prominent similarity measures and two feature weighting functions showed that the bootstrapping scheme is robust and is independent of the original functions over which it is applied.

77 citations


Journal ArticleDOI
TL;DR: The transition from annotated data to a gold standard, that is, a subset that is sufficiently noise-free with high confidence, is discussed, which helps promote cautious benchmarking.
Abstract: This article discusses the transition from annotated data to a gold standard, that is, a subset that is sufficiently noise-free with high confidence. Unless appropriately reinterpreted, agreement coefficients do not indicate the quality of the data set as a benchmarking resource: High overall agreement is neither sufficient nor necessary to distill some amount of highly reliable data from the annotated material. A mathematical framework is developed that allows estimation of the noise level of the agreed subset of annotated data, which helps promote cautious benchmarking.

76 citations


Journal ArticleDOI
TL;DR: Although NLP in general has benefitted from advances in those areas where prepositions have received attention, there are still many issues to be addressed and accurate models of preposition usage are essential to avoid repeatedly making errors.
Abstract: Prepositions1—as well as prepositional phrases (PPs) and markers of various sorts— have a mixed history in computational linguistics (CL), as well as related fields such as artificial intelligence, information retrieval (IR), and computational psycholinguistics: On the one hand they have been championed as being vital to precise language understanding (e.g., in information extraction), and on the other they have been ignored on the grounds of being syntactically promiscuous and semantically vacuous, and relegated to the ignominious rank of “stop word” (e.g., in text classification and IR). Although NLP in general has benefitted from advances in those areas where prepositions have received attention, there are still many issues to be addressed. For example, in machine translation, generating a preposition (or “case marker” in languages such as Japanese) incorrectly in the target language can lead to critical semantic divergences over the source language string. Equivalently in information retrieval and information extraction, it would seem desirable to be able to predict that book on NLP and book about NLPmean largely the same thing, but paranoid about drugs and paranoid on drugs suggest very different things. Prepositions are often among the most frequent words in a language. For example, based on the British National Corpus (BNC; Burnard 2000), four out of the top-ten most-frequent words in English are prepositions (of, to, in, and for). In terms of both parsing and generation, therefore, accurate models of preposition usage are essential to avoid repeatedly making errors. Despite their frequency, however, they are notoriously difficult to master, even for humans (Chodorow, Tetreault, and Han 2007). For example, Lindstromberg (2001) estimates that less than 10% of upper-level English as a Second

62 citations


Journal ArticleDOI
TL;DR: In large-scale experiments, it is found that almost all rules are binarizable and the resulting binarized rule set significantly improves the speed and accuracy of a state-of-the-art syntax-based machine translation system.
Abstract: Systems based on synchronous grammars and tree transducers promise to improve the quality of statistical machine translation output, but are often very computationally intensive. The complexity is exponential in the size of individual grammar rules due to arbitrary re-orderings between the two languages. We develop a theory of binarization for synchronous context-free grammars and present a linear-time algorithm for binarizing synchronous rules when possible. In our large-scale experiments, we found that almost all rules are binarizable and the resulting binarized rule set significantly improves the speed and accuracy of a state-of-the-art syntax-based machine translation system. We also discuss the more general, and computationally more difficult, problem of finding good parsing strategies for non-binarizable rules, and present an approximate polynomial-time algorithm for this problem.

61 citations


Journal ArticleDOI
TL;DR: The main result is that the simplest metric (which relies exclusively on NOCB transitions) sets a robust baseline that cannot be outperformed by other metrics which make use of additional centering-based features.
Abstract: In this article we discuss several metrics of coherence defined using centering theory and investigate the usefulness of such metrics for information ordering in automatic text generation. We estimate empirically which is the most promising metric and how useful this metric is using a general methodology applied on several corpora. Our main result is that the simplest metric (which relies exclusively on NOCB transitions) sets a robust baseline that cannot be outperformed by other metrics which make use of additional centering-based features. This baseline can be used for the development of both text-to-text and concept-to-text generation systems.

Journal ArticleDOI
TL;DR: A combination of basic kernel functions are used to independently estimate syntagmatic and domain similarity, building a set of word-expert classifiers that share a common domain model acquired from a large corpus of unlabeled data.
Abstract: We present a semi-supervised technique for word sense disambiguation that exploits external knowledge acquired in an unsupervised manner. In particular, we use a combination of basic kernel functions to independently estimate syntagmatic and domain similarity, building a set of word-expert classifiers that share a common domain model acquired from a large corpus of unlabeled data. The results show that the proposed approach achieves state-of-the-art performance on a wide range of lexical sample tasks and on the English all-words task of Senseval-3, although it uses a considerably smaller number of training examples than other methods.

Journal ArticleDOI
TL;DR: In this paper, semantic role annotations provided by the Penn Treebank and FrameNet tagged corpora are used for preposition disambiguation and a common inventory is derived from these in support of definition analysis, which is the motivation for this work.
Abstract: This article describes how semantic role resources can be exploited for preposition disambiguation. The main resources include the semantic role annotations provided by the Penn Treebank and FrameNet tagged corpora. The resources also include the assertions contained in the Factotum knowledge base, as well as information from Cyc and Conceptual Graphs. A common inventory is derived from these in support of definition analysis, which is the motivation for this work. The disambiguation concentrates on relations indicated by prepositional phrases, and is framed as word-sense disambiguation for the preposition in question. A new type of feature for word-sense disambiguation is introduced, using WordNet hypernyms as collocations rather than just words. Various experiments over the Penn Treebank and FrameNet data are presented, including prepositions classified separately versus together, and illustrating the effects of filtering. Similar experimentation is done over the Factotum data, including a method for inferring likely preposition usage from corpora, as knowledge bases do not generally indicate how relationships are expressed in English (in contrast to the explicit annotations on this in the Penn Treebank and FrameNet). Other experiments are included with the FrameNet data mapped into the common relation inventory developed for definition analysis, illustrating how preposition disambiguation might be applied in lexical acquisition.

Journal ArticleDOI
TL;DR: This article shows how the finite-state approach to multimodal language processing can be extended to support multimodals applications combining speech with complex freehand pen input, and evaluates the approach in the context of a multimodAL conversational system (MATCH).
Abstract: Multimodal grammars provide an effective mechanism for quickly creating integration and understanding capabilities for interactive systems supporting simultaneous use of multiple input modalities. However, like other approaches based on hand-crafted grammars, multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent input. In this article, we show how the finite-state approach to multimodal language processing can be extended to support multimodal applications combining speech with complex freehand pen input, and evaluate the approach in the context of a multimodal conversational system (MATCH). We explore a range of different techniques for improving the robustness of multimodal integration and understanding. These include techniques for building effective language models for speech recognition when little or no multimodal training data is available, and techniques for robust multimodal understanding that draw on classification, machine translation, and sequence edit methods. We also explore the use of edit-based methods to overcome mismatches between the gesture stream and the speech stream.

Journal ArticleDOI
TL;DR: The language documentation community uses technology to process language, but is largely ignorant of the field of natural language processing.
Abstract: March 2009 marked an important milestone: the First International Conference on Language Documentation and Conservation, held at the University of Hawai‘i.1 The scale of the event was striking, with five parallel tracks running over three days. The organizers coped magnificently with three times the expected participation (over 300). The buzz among the participants was that we were at the start of something big, that we were already part of a significant and growing community dedicated to supporting small languages together, the conference subtitle. The event was full of computation and linguistics, yet devoid of computational linguistics. The language documentation community uses technology to process language, but is largely ignorant of the field of natural language processing. I pondered what we have to offer this community: “Send us your 10 million words of Nahuatl-English bitext and we’ll do you a machine translation system!” “Show us your Bambara WordNet and we’ll use it to train a word sense disambiguation tool!” “Write up the word-formation rules of Inuktitut in this arcane format and we’ll give you a morphological analyzer!” Is there not some more immediate contribution we could offer?

Journal ArticleDOI
TL;DR: A single unified referential semantic probability model is described which brings several kinds of context to bear in speech decoding, and performs accurate recognition in real time on large domains in the absence of example in-domain training sentences.
Abstract: This article describes a framework for incorporating referential semantic information from a world model or ontology directly into a probabilistic language model of the sort commonly used in speech recognition, where it can be probabilistically weighted together with phonological and syntactic factors as an integral part of the decoding process. Introducing world model referents into the decoding search greatly increases the search space, but by using a single integrated phonological, syntactic, and referential semantic language model, the decoder is able to incrementally prune this search based on probabilities associated with these combined contexts. The result is a single unified referential semantic probability model which brings several kinds of context to bear in speech decoding, and performs accurate recognition in real time on large domains in the absence of example in-domain training sentences.

Journal ArticleDOI
TL;DR: In the following pages, a computational linguist calls for the return of linguistics to computational linguistics.
Abstract: One of the most thought-provoking proposals I have heard recently came from Lori Levin during the discussion that concluded the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics. Lori proposed that we should form an ACL Special Interest Group on Linguistics. At first blush, I found the idea weird: Isn’t it a little like the American Academy of Pediatrics forming a SIG on Medicine (or on Children)? Second thoughts, however, revealed the appropriateness of the idea: In essence, linguistics is altogether missing in contemporary natural language engineering research. In the following pages I want to call for the return of linguistics to computational linguistics. The last two decades were marked by a complete paradigm shift in computational linguistics. Frustrated by the inability of applications based on explicit linguistic knowledge to scale up to real-world needs, and, perhaps more deeply, frustrated with the dominating theories in formal linguistics, we looked instead to corpora that reflect language use as our sources of (implicit) knowledge. With the shift in methodology came a subtle change in the goals of our entire enterprise. Two decades ago, a computational linguist could be interested in developing NLP applications; or in formalizing (and reasoning about) linguistic processes. These days, it is the former only. A superficial look at the papers presented in our main conferences reveals that the vast majority of them are engineering papers, discussing engineering solutions to practical problems. Virtually none addresses fundamental issues in linguistics. There’s nothing wrong with engineering work, of course. Every school of technology has departments of engineering in areas as diverse as Chemical Engineering, Mechanical Engineering, Aeronautical Engineering, or Biomedical Engineering; there’s no reason why there shouldn’t also be a discipline of Natural Language Engineering. But in the more established disciplines, engineering departments conduct research that is informed by some well-defined branch of science. Chemical engineers study chemistry; electrical engineers study physics; aeronautical engineers study dynamics; and biomedical engineers study biology, physiology, medical sciences, and so on. The success of engineering is also in part due to the choice of the “right” mathematics. The theoretical development of several scientific areas, notably physics, went alongside mathematical developments. Physics could not have accounted for natural phenomena without such mathematical infrastructure. For example, the development of (partial) differential equations went hand in hand with some of the greatest achievement in physics, and this branch of mathematics later turned out to be applicable also to chemistry, electrical engineering, and economics, among many other scientific fields.

Journal ArticleDOI
Anja Belz1
TL;DR: The talk included progress reports on the current size of the artificial brain, its structure, update rate, and power consumption, and explained how intelli- gent behavior was going to develop by mechanisms simulating biological evolution.
Abstract: A regular fixture on the mid 1990s international research seminar circuit was the billion-neuron artificial brain talk. The idea behind this project was simple: in order to create artificial intelligence, what was needed first of all was a very large artificial brain; if a big enough set of interconnected modules of neurons could be implemented, then it would be possible to evolve mammalian-level behavior with current computational- neuron technology. The talk included progress reports on the current size of the artificial brain, its structure, update rate, and power consumption, and explained how intelli- gent behavior was going to develop by mechanisms simulating biological evolution. What the talk didnt mention was what kind of functionality the team had so far managed to evolve, and so the first comment at the end of the talk was inevitably nice work, but have you actually done anything with the brain yet?1 In human language technology (HLT) research, we currently report a range of evaluation scores that measure and assess various aspects of systems, in particular the similarity of their outputs to samples of human language or to human-produced gold- standard annotations, but are we leaving ourselves open to the same question as the billion-neuron artificial brain researchers?

Journal ArticleDOI
TL;DR: Given a training set of English nominal phrases and compounds along with their translations in the five Romance languages, the algorithm automatically learns classification rules and applies them to unseen test instances for semantic interpretation and results are compared against two state-of-the-art models reported in the literature.
Abstract: In this article we explore the syntactic and semantic properties of prepositions in the context of the semantic interpretation of nominal phrases and compounds. We investigate the problem based on cross-linguistic evidence from a set of six languages: English, Spanish, Italian, French, Portuguese, and Romanian. The focus on English and Romance languages is well motivated. Most of the time, English nominal phrases and compounds translate into constructions of the form N P N in Romance languages, where the P (preposition) may vary in ways that correlate with the semantics. Thus, we present empirical observations on the distribution of nominal phrases and compounds and the distribution of their meanings on two different corpora, based on two state-of-the-art classification tag sets: Lauer's set of eight prepositions and our list of 22 semantic relations. A mapping between the two tag sets is also provided. Furthermore, given a training set of English nominal phrases and compounds along with their translations in the five Romance languages, our algorithm automatically learns classification rules and applies them to unseen test instances for semantic interpretation. Experimental results are compared against two state-of-the-art models reported in the literature.

Journal ArticleDOI
TL;DR: Introduction to Information Retrieval is the first textbook with a coherent treatment of classical and web information retrieval, including web search and the related areas of text classification and text clustering.
Abstract: Introduction to Information Retrieval is the first textbook with a coherent treatment of classical and web information retrieval, including web search and the related areas of text classification and text clustering. Written from a computer science perspective, it gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents and of methods for evaluating systems, along with an introduction to the use of machine learning methods on text collections. Designed as the primary text for a graduate or advanced undergraduate course in information retrieval, the book will also interest researchers and professionals. A complete set of lecture slides and exercises that accompany the book are available on the web.

Journal ArticleDOI
TL;DR: The leader of three innovative groups that he headed in the last 47 years: at Cornell, IBM, and now at Johns Hopkins is told that he should give an acceptance speech and was furnished with example texts by previous recipients.
Abstract: I am very grateful for the award you have bestowed on me. To understand your generosity I have to assume that you are honoring the leadership of three innovative groups that I headed in the last 47 years: at Cornell, IBM, and now at Johns Hopkins. You know my co-workers in the last two teams. The Cornell group was in Information Theory and included Toby Berger, Terrence Fine, and Neil J. A. Sloane (earlier my Ph.D. student), all of whom earned their own laurels. I was told that I should give an acceptance speech and was furnished with example texts by previous recipients. They wrote about the development and impact of their ideas. So I will tell you about my beginnings and motivations and then focus on the contributions of my IBM team. In this way the text will have some historical value and may clear up certain widely held misconceptions.

Journal ArticleDOI
TL;DR: This work uses the three-valued logic of Elementary Ranking Conditions to show that the VCD of Optimality Theory with k constraints is k-1 and establishes that the complexity of OT is a well-behaved function of k and that the hardness of learning in OT is linear in k for a variety of frameworks that employ probabilistic definitions of learnability.
Abstract: Given a constraint set with k constraints in the framework of Optimality Theory (OT), what is its capacity as a classification scheme for linguistic data? One useful measure of this capacity is the size of the largest data set of which each subset is consistent with a different grammar hypothesis. This measure is known as the Vapnik-Chervonenkis dimension (VCD) and is a standard complexity measure for concept classes in computational learnability theory. In this work, I use the three-valued logic of Elementary Ranking Conditions to show that the VCD of Optimality Theory with k constraints is k-1. Analysis of OT in terms of the VCD establishes that the complexity of OT is a well-behaved function of k and that the 'hardness' of learning in OT is linear in k for a variety of frameworks that employ probabilistic definitions of learnability.

Journal ArticleDOI
TL;DR: An investigation of corpus-based methods for the automation of help-desk e-mail responses along two operational dimensions: information-gathering technique, and granularity of the information.
Abstract: This article presents an investigation of corpus-based methods for the automation of help-desk e-mail responses. Specifically, we investigate this problem along two operational dimensions: (1) information-gathering technique, and (2) granularity of the information. We consider two information-gathering techniques (retrieval and prediction) applied to information represented at two levels of granularity (document-level and sentence-level). Document-level methods correspond to the reuse of an existing response e-mail to address new requests. Sentence-level methods correspond to applying extractive multi-document summarization techniques to collate units of information from more than one e-mail. Evaluation of the performance of the different methods shows that in combination they are able to successfully automate the generation of responses for a substantial portion of e-mail requests in our corpus. We also investigate a meta-selection process that learns to choose one method to address a new inquiry e-mail, thus providing a unified response automation solution.


Journal ArticleDOI
TL;DR: The article investigates the distinction between static and directional locatives, and between different types of directional locative PPs, and shows how this analysis can be incorporated into Minimal Recursion Semantics (MRS) (Copestake et al. 2005).
Abstract: The article describes a pilot implementation of a grammar containing different types of locative PPs. In particular, we investigate the distinction between static and directional locatives, and between different types of directional locatives. Locatives may act as modifiers as well as referring expressions depending on the syntactic context. We handle this with a single lexical entry. The implementation is of Norwegian locatives, but English locatives are both discussed and compared to Norwegian locatives. The semantic analysis is based on a proposal by Markus Kracht (2002), and we show how this analysis can be incorporated into Minimal Recursion Semantics (MRS) (Copestake et al. 2005). We discuss how the resulting system may be applied in a transfer-based machine translation system, and how we can map from a shallow MRS representation to a deeper semantic representation.

Journal ArticleDOI
TL;DR: Learning Machine Translation is a book focused on the application of machine learning to SMT, presenting a number of approaches applying discriminative machine learning techniques within a SMT decoder.
Abstract: Attending recent computational linguistics conferences, it is hard to ignore the phenomenal amount of research devoted to statistical machine translation (SMT). Driven by the wide availability of open-source translation systems, corpora, and evaluation tools, a research area that was once the preserve of large research groups has become accessible to those of more modest resources. Although the current state-of-the-art SMT systems have matured into robust commercial systems, capable of providing reasonable quality translations for a variety of domains, they remain limited by naive modeling assumptions and a heavy reliance on heuristics. These limitations have led researchers to ask the question of whether the adoption of techniques from the machine learning literature could allow more complex translations to be modeled effectively. As such, this book, focused on the application of machine learning to SMT, is particularly timely in capturing the current interest of the machine translation community. Learning Machine Translation is presented in two parts. The first, titled “Enabling Technologies,” focuses on research peripheral to machine translation. Topics covered include the acquisition of parallel corpora, cross-language named-entity processing, and language modeling. The second part covers core machine translation system building, presenting a number of approaches applying discriminative machine learning techniques within a SMT decoder. Much of the content of the book arose from the Machine Learning for Multilingual AccessWorkshop held at the Neural Information Processing conference in 2006. As SMT is not a frequent topic at that conference, the bridging of research from the mainstream machine learning community with research on MT is particularly promising. A fine example of this cross-over is Chapter 9, “Kernel-Based Machine Translation,” in which a novel approach to estimating translation models is presented. However, this promise is not entirely fulfilled, as some contributions either fail to make use of machine learning or are somewhat obscure, unlikely to impact on the mainstream SMT community.


Journal ArticleDOI
TL;DR: The authors proposed two methodological approaches that combine the quantitative and qualitative views into a corpus-based description of discourse organization, which provides detailed analyses of individual texts and the generalization of these analyses across all the texts of a genre-specific corpus.
Abstract: The study of discourse can be undertaken from different perspectives (e.g., linguistic, cognitive, or computational) with differing purposes in mind (e.g., to study language use or to analyze social practices). The aim of Discourse on the Move is to show that it is possible and profitable to join quantitative and qualitative analyses to study discourse structures. Whereas corpus-based quantitative discourse analysis focuses on the distributional discourse patterns of a corpus as a whole with no indication of how patterns are distributed in individual texts, manual qualitative analysis is always carried out on a small number of texts and does not support large generalizations of the findings. The book proposes two methodological approaches—top-down and bottomup—that combine the quantitative and qualitative views into a corpus-based description of discourse organization. Such a description provides detailed analyses of individual texts and the generalization of these analyses across all the texts of a genre-specific corpus. Top-down is the more traditional (not necessarily corpus-based) approach in which researchers establish functional–qualitative methods to develop an analytical framework capable of describing the types of discourse units in a target corpus. In this approach, linguistic–quantitative analyses come as a later step to facilitate the interpretation of discourse types. In contrast, the bottom-up approach begins with a linguistic– quantitative analysis based on the automatic segmentation of texts into discourse units on the basis of vocabulary distributional patterns. In this approach, the functional– qualitative analysis that provides an interpretation of the discourse types is performed as a later step. Both top-down and bottom-up analyses can be broken down into seven procedural steps, but the order of the steps in the two approaches is not the same. The steps to be followed in top-down methods are these: