Showing papers in &quot;Computational Linguistics in 2012&quot;

Are you sure that this happened? assessing the factuality degree of events in text

TL;DR: The REG problem is introduced and early work in this area is described, discussing what basic assumptions lie behind it, and showing how its remit has widened in recent years.

...read moreread less

Abstract: This article offers a survey of computational research on referring expression generation (REG). It introduces the REG problem and describes early work in this area, discussing what basic assumptions lie behind it, and showing how its remit has widened in recent years. We discuss computational frameworks underlying REG, and demonstrate a recent trend that seeks to link REG algorithms with well-established Knowledge Representation techniques. Considerable attention is given to recent efforts at evaluating REG algorithms and the lessons that they allow us to learn. The article concludes with a discussion of the way forward in REG, focusing on references in larger and more realistic settings.

...read moreread less

352 citations

Journal Article•DOI•

[...]

Roser Saurí, James Pustejovsky¹•Institutions (1)

Brandeis University¹

Did it happen? the pragmatic complexity of veridicality assessment

TL;DR: A linguistic-oriented computational model is put forward which has at its core an algorithm articulating the effect of factuality relations across levels of syntactic embedding, implemented in De Facto, a factuality profiler for eventualities mentioned in text and tested against a corpus built specifically for the task.

...read moreread less

Abstract: Identifying the veracity, or factuality, of event mentions in text is fundamental for reasoning about eventualities in discourse. Inferences derived from events judged as not having happened, or as being only possible, are different from those derived from events evaluated as factual. Event factuality involves two separate levels of information. On the one hand, it deals with polarity, which distinguishes between positive and negative instantiations of events. On the other, it has to do with degrees of certainty (e.g., possible, probable), an information level generally subsumed under the category of epistemic modality. This article aims at contributing to a better understanding of how event factuality is articulated in natural language. For that purpose, we put forward a linguistic-oriented computational model which has at its core an algorithm articulating the effect of factuality relations across levels of syntactic embedding. As a proof of concept, this model has been implemented in De Facto, a factuality profiler for eventualities mentioned in text, and tested against a corpus built specifically for the task, yielding an F1 of 0.70 (macro-averaging) and 0.80 (micro-averaging). These two measures mutually compensate for an over-emphasis present in the other (either on the lesser or greater populated categories), and can therefore be interpreted as the lower and upper bounds of the De Facto's performance.

...read moreread less

150 citations

Journal Article•DOI•

[...]

Marie-Catherine de Marneffe¹, Christopher D. Manning¹, Christopher Potts¹•Institutions (1)

Stanford University¹

Modality and negation: An introduction to the special issue

TL;DR: This work extends the FactBank corpus, which contains semantically driven veridicality annotations, with pragmatically informed ones and shows that context and world knowledge play a significant role in shaping verdicality.

...read moreread less

Abstract: Natural language understanding depends heavily on assessing veridicality-whether events mentioned in a text are viewed as happening or not-but little consideration is given to this property in current relation and event extraction systems. Furthermore, the work that has been done has generally assumed that veridicality can be captured by lexical semantic properties whereas we show that context and world knowledge play a significant role in shaping veridicality. We extend the FactBank corpus, which contains semantically driven veridicality annotations, with pragmatically informed ones. Our annotations are more complex than the lexical assumption predicts but systematic enough to be included in computational work on textual understanding. They also indicate that veridicality judgments are not always categorical, and should therefore be modeled as distributions. We build a classifier to automatically assign event veridicality distributions based on our new annotations. The classifier relies not only on lexical features like hedges or negations, but also on structural features and approximations of world knowledge, thereby providing a nuanced picture of the diverse factors that shape veridicality. "All I know is what I read in the papers" -Will Rogers

...read moreread less

140 citations

Journal Article•DOI•

[...]

Roser Morante¹, Caroline Sporleder²•Institutions (2)

University of Antwerp¹, Saarland University²

Speculation and negation: Rules, rankers, and the role of syntax

TL;DR: An overview of how modality and negation have been modeled in computational linguistics is provided.

...read moreread less

Abstract: Traditionally, most research in NLP has focused on propositional aspects of meaning. To truly understand language, however, extra-propositional aspects are equally important. Modality and negation typically contribute significantly to these extra-propositional meaning aspects. Although modality and negation have often been neglected by mainstream computational linguistics, interest has grown in recent years, as evidenced by several annotation projects dedicated to these phenomena. Researchers have started to work on modeling factuality, belief and certainty, detecting speculative sentences and hedging, identifying contradictions, and determining the scope of expressions of modality and negation. In this article, we will provide an overview of how modality and negation have been modeled in computational linguistics.

...read moreread less

139 citations

Journal Article•DOI•

[...]

Erik Velldal¹, Lilja Øvrelid¹, Jonathon Read¹, Stephan Oepen¹•Institutions (1)

University of Oslo¹

Affirmative cue words in task-oriented dialogue

TL;DR: This article explores a combination of deep and shallow approaches to the problem of resolving the scope of speculation and negation within a sentence, specifically in the domain of biomedical research literature and shows that although both approaches perform well in isolation, even better results can be obtained by combining them.

...read moreread less

Abstract: This article explores a combination of deep and shallow approaches to the problem of resolving the scope of speculation and negation within a sentence, specifically in the domain of biomedical research literature. The first part of the article focuses on speculation. After first showing how speculation cues can be accurately identified using a very simple classifier informed only by local lexical context, we go on to explore two different syntactic approaches to resolving the in-sentence scopes of these cues. Whereas one uses manually crafted rules operating over dependency structures, the other automatically learns a discriminative ranking function over nodes in constituent trees. We provide an in-depth error analysis and discussion of various linguistic properties characterizing the problem, and show that although both approaches perform well in isolation, even better results can be obtained by combining them, yielding the best published results to date on the CoNLL-2010 Shared Task data. The last part of the article describes how our speculation system is ported to also resolve the scope of negation. With only modest modifications to the initial design, the system obtains state-of-the-art results on this task also.

...read moreread less

92 citations

Journal Article•DOI•

[...]

Agustín Gravano¹, Julia Hirschberg², Štefan Beňuš³•Institutions (3)

University of Buenos Aires¹, Columbia University², Slovak Academy of Sciences³

Cross-genre and cross-domain detection of semantic uncertainty

TL;DR: It is found that contextual information and final intonation figure as the most salient cues to automatic disambiguation of affirmative cue words, a family of cue words that speakers use frequently in conversation.

...read moreread less

Abstract: We present a series of studies of affirmative cue words-a family of cue words such as "okay" or "alright" that speakers use frequently in conversation. These words pose a challenge for spoken dialogue systems because of their ambiguity: They may be used for agreeing with what the interlocutor has said, indicating continued attention, or for cueing the start of a new topic, among other meanings. We describe differences in the acoustic/prosodic realization of such functions in a corpus of spontaneous, task-oriented dialogues in Standard American English. These results are important both for interpretation and for production in spoken language applications. We also assess the predictive power of computational methods for the automatic disambiguation of these words. We find that contextual information and final intonation figure as the most salient cues to automatic disambiguation.

...read moreread less

78 citations

Journal Article•DOI•

[...]

György Szarvas¹, Veronika Vincze², Richárd Farkas³, György Móra⁴, Iryna Gurevych¹ - Show less +1 more•Institutions (4)

Technische Universität Darmstadt¹, Hungarian Academy of Sciences², University of Stuttgart³, University of Szeged⁴

Semantic role labeling of implicit arguments for nominal predicates

TL;DR: A unified subcategorization of semantic uncertainty as different domain applications can apply different uncertainty categories is introduced and the domain adaptation for training the models offer an efficient solution for cross-domain and cross-genre semantic uncertainty recognition.

...read moreread less

Abstract: Uncertainty is an important linguistic phenomenon that is relevant in various Natural Language Processing applications, in diverse genres from medical to community generated, newswire or scientific discourse, and domains from science to humanities. The semantic uncertainty of a proposition can be identified in most cases by using a finite dictionary (i.e., lexical cues) and the key steps of uncertainty detection in an application include the steps of locating the (genre-and domain-specific) lexical cues, disambiguating them, and linking them with the units of interest for the particular application (e.g., identified events in information extraction). In this study, we focus on the genre and domain differences of the context-dependent semantic uncertainty cue recognition task. We introduce a unified subcategorization of semantic uncertainty as different domain applications can apply different uncertainty categories. Based on this categorization, we normalized the annotation of three corpora and present results with a state-of-the-art uncertainty cue recognition model for four fine-grained categories of semantic uncertainty. Our results reveal the domain and genre dependence of the problem; nevertheless, we also show that even a distant source domain data set can contribute to the recognition and disambiguation of uncertainty cues, efficiently reducing the annotation costs needed to cover a new domain. Thus, the unified subcategorization and domain adaptation for training the models offer an efficient solution for cross-domain and cross-genre semantic uncertainty recognition.

...read moreread less

77 citations

Journal Article•DOI•

[...]

Matthew S. Gerber¹, Joyce Y. Chai²•Institutions (2)

University of Virginia¹, Michigan State University²

A Context-Theoretic Framework for Compositionality in Distributional Semantics

TL;DR: Using a corpus of implicit arguments for ten predicates from NomBank, a discriminative model is trained that is able to identify implicit arguments with an F1 score of 50%, significantly outperforming an informed baseline model.

...read moreread less

Abstract: Nominal predicates often carry implicit arguments. Recent work on semantic role labeling has focused on identifying arguments within the local context of a predicate; implicit arguments, however, have not been systematically examined. To address this limitation, we have manually annotated a corpus of implicit arguments for ten predicates from NomBank. Through analysis of this corpus, we find that implicit arguments add 71% to the argument structures that are present in NomBank. Using the corpus, we train a discriminative model that is able to identify implicit arguments with an F1 score of 50%, significantly outperforming an informed baseline model. This article describes our investigation, explores a wide variety of features important for the task, and discusses future directions for work on implicit argument identification.

...read moreread less

77 citations

Journal Article•DOI•

[...]

Daoud Clarke¹•Institutions (1)

University of Hertfordshire¹

20 Jun 2012-Computational Linguistics

TL;DR: In this article, a vector lattice ordering is used to represent textual entailment, inspired by a strengthened form of the distributional hypothesis, and a degree of entailment is defined in the form of a conditional probability.

...read moreread less

Abstract: Formalizing “meaning as context” mathematically leads to a new, algebraic theory of meaning, in which composition is bilinear and associative These properties are shared by other methods that have been proposed in the literature, including the tensor product, vector addition, point-wise multiplication, and matrix multiplicationEntailment can be represented by a vector lattice ordering, inspired by a strengthened form of the distributional hypothesis, and a degree of entailment is defined in the form of a conditional probability Approaches to the task of recognizing textual entailment, including the use of subsequence matching, lexical entailment probability, and latent Dirichlet allocation, can be described within our framework

...read moreread less

61 citations

Journal Article•DOI•

Modality and negation in simt use of modality and negation in semantically-informed syntactic mt

[...]

Kathryn Baker, Michael Bloodgood¹, Bonnie J. Dorr¹, Chris Callison-Burch², Nathaniel Wesley Filardo², Christine D. Piatko², Lori Levin³, Scott Miller⁴ - Show less +4 more•Institutions (4)

University of Maryland, College Park¹, Johns Hopkins University², Carnegie Mellon University³, BBN Technologies⁴

Summarizing Information Graphics Textually

TL;DR: The resulting system significantly outperformed a linguistically naive baseline model, and reached the highest scores yet reported on the NIST 2009 Urdu–English test set, supports the hypothesis that both syntactic and semantic information can improve translation quality.

...read moreread less

Abstract: This article describes the resource-and system-building efforts of an 8-week Johns Hopkins University Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically Informed Machine Translation (SIMT). We describe a new modality/negation (MN) annotation scheme, the creation of a (publicly available) MN lexicon, and two automated MN taggers that we built using the annotation scheme and lexicon. Our annotation scheme isolates three components of modality and negation: a trigger (a word that conveys modality or negation), a target (an action associated with modality or negation), and a holder (an experiencer of modality). We describe how our MN lexicon was semi-automatically produced and we demonstrate that a structure-based MN tagger results in precision around 86% (depending on genre) for tagging of a standard LDC data set. We apply our MN annotation scheme to statistical machine translation using a syntactic framework that supports the inclusion of semantic annotations. Syntactic tags enriched with semantic annotations are assigned to parse trees in the target-language training texts through a process of tree grafting. Although the focus of our work is modality and negation, the tree grafting procedure is general and supports other types of semantic information. We exploit this capability by including named entities, produced by a pre-existing tagger, in addition to the MN elements produced by the taggers described here. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet reported on the NIST 2009 Urdu-English test set. This finding supports the hypothesis that both syntactic and semantic information can improve translation quality.

...read moreread less

50 citations

Journal Article•DOI•

[...]

Seniz Demir¹, Sandra Carberry², Kathleen F. McCoy²•Institutions (2)

Scientific and Technological Research Council of Turkey¹, University of Delaware²

Semi-supervised semantic role labeling via structural alignment

TL;DR: This work brings together insights obtained from empirical studies in order to determine what should be contained in the summaries of this form of non-linguistic input data, and how the information required for realizing the selected content can be extracted from the visual image and the textual components of the graphic.

...read moreread less

Abstract: Information graphics (such as bar charts and line graphs) play a vital role in many multimodal documents. The majority of information graphics that appear in popular media are intended to convey a message and the graphic designer uses deliberate communicative signals, such as highlighting certain aspects of the graphic, in order to bring that message out. The graphic, whose communicative goal (intended message) is often not captured by the document’s accompanying text, contributes to the overall purpose of the document and cannot be ignored. This article presents our approach to providing the high-level content of a non-scientific information graphic via a brief textual summary which includes the intended message and the salient features of the graphic. This work brings together insights obtained from empirical studies in order to determine what should be contained in the summaries of this form of non-linguistic input data, and how the information required for realizing the selected content can be extracted from the visual image and the textual components of the graphic. This work also presents a novel bottom–up generation approach to simultaneously construct the discourse and sentence structures of textual summaries by leveraging different discourse related considerations such as the syntactic complexity of realized sentences and clause embeddings. The effectiveness of our work was validated by different evaluation studies.

...read moreread less

Journal Article•DOI•

[...]

Hagen Fürstenau¹, Mirella Lapata²•Institutions (2)

Columbia University¹, University of Edinburgh²

Learning entailment relations by global graph structure optimization

TL;DR: This work aims to reduce the annotation effort involved in creating resources for semantic role labeling via semi-supervised learning by formalizing the detection of similar sentences and the projection of role annotations as a graph alignment problem, which it solves exactly using integer linear programming.

...read moreread less

Abstract: Large-scale annotated corpora are a prerequisite to developing high-performance semantic role labeling systems. Unfortunately, such corpora are expensive to produce, limited in size, and may not be representative. Our work aims to reduce the annotation effort involved in creating resources for semantic role labeling via semi-supervised learning. The key idea of our approach is to find novel instances for classifier training based on their similarity to manually labeled seed instances. The underlying assumption is that sentences that are similar in their lexical material and syntactic structure are likely to share a frame semantic analysis. We formalize the detection of similar sentences and the projection of role annotations as a graph alignment problem, which we solve exactly using integer linear programming. Experimental results on semantic role labeling show that the automatic annotations produced by our method improve performance over using hand-labeled instances alone.

...read moreread less

Journal Article•DOI•

[...]

Jonathan Berant¹, Ido Dagan², Jacob Goldberger²•Institutions (2)

Tel Aviv University¹, Bar-Ilan University²

Modeling Regular Polysemy: A Study on the Semantic Classification of Catalan Adjectives

TL;DR: A global inference algorithm is proposed that learns on the fly all entailment rules between predicates that co-occur with this concept, and uses a global transitivity constraint on the graph to learn the optimal set of edges.

...read moreread less

Abstract: Identifying entailment relations between predicates is an important part of applied semantic inference. In this article we propose a global inference algorithm that learns such entailment rules. First, we define a graph structure over predicates that represents entailment relations as directed edges. Then, we use a global transitivity constraint on the graph to learn the optimal set of edges, formulating the optimization problem as an Integer Linear Program. The algorithm is applied in a setting where, given a target concept, the algorithm learns on the fly all entailment rules between predicates that co-occur with this concept. Results show that our global algorithm improves performance over baseline algorithms by more than 10%.

...read moreread less

Journal Article•DOI•

[...]

Gemma Boleda¹, Sabine Schulte im Walde², Toni Badia¹•Institutions (2)

Pompeu Fabra University¹, University of Stuttgart²

Generating numerical approximations

TL;DR: A study on the automatic acquisition of semantic classes for Catalan adjectives from distributional and morphological information, with particular emphasis on polysemous adjectives, and argues that the second model, which models regular polysemy in terms of simultaneous membership to multiple basic classes, is both theoretically and empirically more adequate.

...read moreread less

Abstract: We present a study on the automatic acquisition of semantic classes for Catalan adjectives from distributional and morphological information, with particular emphasis on polysemous adjectives. The aim is to distinguish and characterize broad classes, such as qualitative (gran ‘big’) and relational (pulmonar ‘pulmonary’) adjectives, as well as to identify polysemous adjectives such as economic (‘economic ∣ cheap’). We specifically aim at modeling regular polysemy, that is, types of sense alternations that are shared across lemmata. To date, both semantic classes for adjectives and regular polysemy have only been sparsely addressed in empirical computational linguistics.Two main specific questions are tackled in this article. First, what is an adequate broad semantic classification for adjectives? We provide empirical support for the qualitative and relational classes as defined in theoretical work, and uncover one type of adjective that has not received enough attention, namely, the event-related class. Se...

...read moreread less

Journal Article•DOI•

[...]

Richard Power¹, Sandra Williams¹•Institutions (1)

Open University¹

A scalable distributed syntactic, semantic, and lexical language model

TL;DR: A computational model is described for planning phrases like “more than a quarter” and “25.9 per cent” which describe proportions at different levels of precision which are modeled as a constraint satisfaction problem, with solutions subsequently ranked by preferences.

...read moreread less

Abstract: We describe a computational model for planning phrases like "more than a quarter" and "25.9 per cent" which describe proportions at different levels of precision. The model lays out the key choices in planning a numerical description, using formal definitions of mathematical form (e.g., the distinction between fractions and percentages) and roundness adapted from earlier studies. The task is modeled as a constraint satisfaction problem, with solutions subsequently ranked by preferences (e.g., for roundness). Detailed constraints are based on a corpus of numerical expressions collected in the NumGen project,1 and evaluated through empirical studies in which subjects were asked to produce (or complete) numerical expressions in specified contexts.

...read moreread less

Journal Article•DOI•

[...]

Ming Tan¹, Wenli Zhou¹, Lei Zheng¹, Shaojun Wang¹•Institutions (1)

Wright State University¹

Finite-state chart constraints for reduced complexity context-free parsing pipelines

TL;DR: The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the Bleu score and “readability” of translations when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.

...read moreread less

Abstract: This paper presents an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating an n-gram model, a structured language model, and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the Bleu score and "readability" of translations when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.

...read moreread less

Journal Article•DOI•

[...]

Brian Roark¹, Kristy Hollingshead², Nathan Bodenstab¹•Institutions (2)

Oregon Health & Science University¹, University of Maryland, College Park²

Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning

TL;DR: This work presents methods for reducing the worst-case and typical-case complexity of a context-free parsing pipeline via hard constraints derived from finite-state pre-processing and demonstrates that this method generalizes across multiple grammars and is complementary to other pruning techniques by presenting empirical results for both exact and approximate inference.

...read moreread less

Abstract: We present methods for reducing the worst-case and typical-case complexity of a context-free parsing pipeline via hard constraints derived from finite-state pre-processing. We perform On predictions to determine if each word in the input sentence may begin or end a multi-word constituent in chart cells spanning two or more words, or allow single-word constituents in chart cells spanning the word itself. These pre-processing constraints prune the search space for any chart-based parsing algorithm and significantly decrease decoding time. In many cases cell population is reduced to zero, which we term chart cell "closing." We present methods for closing a sufficient number of chart cells to ensure provably quadratic or even linear worst-case complexity of context-free inference. In addition, we apply high precision constraints to achieve large typical-case speedups and combine both high precision and worst-case bound constraints to achieve superior performance on both short and long strings. These bounds on processing are achieved without reducing the parsing accuracy, and in some cases accuracy improves. We demonstrate that our method generalizes across multiple grammars and is complementary to other pruning techniques by presenting empirical results for both exact and approximate inference using the exhaustive CKY algorithm, the Charniak parser, and the Berkeley parser. We also report results parsing Chinese, where we achieve the best reported results for an individual model on the commonly reported data set.

...read moreread less

Journal Article•DOI•

[...]

Shay B. Cohen¹, Noah A. Smith²•Institutions (2)

Columbia University¹, Carnegie Mellon University²

International Computer Science Institute¹

TL;DR: This work presents a framework for empirical risk minimization of probabilistic grammars using the log-loss, and derives sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting.

...read moreread less

Abstract: Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk.

...read moreread less

Journal Article•DOI•

Encounters with language

[...]

Charles J. Fillmore¹•Institutions (1)

Tree-Adjoining Grammars Are Not Closed Under Strong Lexicalization

TL;DR: A record of my encounters with language and my changing views of what one ought to believe about language and how one might represent its properties are offered in this essay.

...read moreread less

Abstract: First of all, I am overwhelmed and humbled by the honor the ACL Executive Committee has shown me, an honor that should be shared by the colleagues and students I’ve been lucky enough to have around me this past decade-and-a-half while I’ve been engaged in the FrameNet Project at the International Computer Science Institute in Berkeley. I’ve been asked to say something about the evolution of the ideas behind the work with which I’ve been associated, so my remarks will be a bit more autobiographical than I might like. I’d like to comment on my changing views of what language is like, and how the facts of language can be represented. As I am sure the ACL Executive Committee knows, I have never been a direct participant in efforts in language engineering, but I have been a witness to, a neighbor of, and an indirect participant in some parts of it, and I have been pleased to learn that some of the resources my colleagues and I are building have been found by some researchers to be useful. I offer a record of my encounters with language and my changing views of what one ought to believe about language and how one might represent its properties. In the course of the narrative I will take note of changes I have observed over the past seven decades or so in both technical and conceptual tools in linguistics and language engineering. One theme in this essay is how these tools, and the representations they support, obscure or reveal the properties of language and therefore affect what one might believe about language. The time frame my life occupies has presented many opportunities to ponder this complex relationship.

...read moreread less

Journal Article•DOI•

[...]

Marco Kuhlmann¹, Giorgio Satta²•Institutions (2)

Uppsala University¹, University of Padua²

Lfg generation by grammar specialization

TL;DR: It has been claimed in the literature that for every tree-adjoining grammar, one can construct a strongly equivalent lexicalized version, but it is shown that such a procedure does not exist: Tree-ad joining grammars are not closed under strong lexicalization.

...read moreread less

Abstract: A lexicalized tree-adjoining grammar is a tree-adjoining grammar where each elementary tree contains some overt lexical item. Such grammars are being used to give lexical accounts of syntactic phenomena, where an elementary tree defines the domain of locality of the syntactic and semantic dependencies of its lexical items. It has been claimed in the literature that for every tree-adjoining grammar, one can construct a strongly equivalent lexicalized version. We show that such a procedure does not exist: Tree-adjoining grammars are not closed under strong lexicalization.

...read moreread less

Journal Article•DOI•

[...]

Jürgen Wedekind¹, Ronald M. Kaplan²•Institutions (2)

University of Copenhagen¹, Nuance Communications²

On the String Translations Produced by Multi Bottom–Up Tree Transducers

TL;DR: An algorithm is presented that produces a context-free grammar describing exactly the set of strings that the given LFG grammar associates with that f-structure, and serves as a compact representation of all generation results that the LFG language assigns to the input.

...read moreread less

Abstract: This article describes an approach to Lexical-Functional Grammar LFG generation that is based on the fact that the set of strings that an LFG grammar relates to a particular acyclic f-structure is a context-free language. We present an algorithm that produces for an arbitrary LFG grammar and an arbitrary acyclic input f-structure a context-free grammar describing exactly the set of strings that the given LFG grammar associates with that f-structure. The individual sentences are then available through a standard context-free generator operating on that grammar. The context-free grammar is constructed by specializing the context-free backbone of the LFG grammar for the given f-structure and serves as a compact representation of all generation results that the LFG grammar assigns to the input. This approach extends to other grammatical formalisms with explicit context-free backbones, such as PATR, and also to formalisms that permit a context-free skeleton to be extracted from richer specifications. It provides a general mathematical framework for understanding and improving the operation of a family of chart-based generation algorithms.

...read moreread less

Journal Article•DOI•

[...]

Daniel Gildea¹•Institutions (1)

University of Rochester¹

09 Mar 2012-Computational Linguistics

TL;DR: The formal power of Multi Bottom-Up Tree Transducers is examined from the point of view of syntax-based machine translation.

...read moreread less

Abstract: Tree transducers are defined as relations between trees, but in syntax-based machine translation, we are ultimately concerned with the relations between the strings at the yields of the input and output trees. We examine the formal power of Multi Bottom-Up Tree Transducers from this point of view.

...read moreread less

Journal Article•DOI•

Fruit Carts: A Domain and Corpus for Research in Dialogue Systems and Psycholinguistics.

[...]

Gregory Aist¹, Ellen Campana², James F. Allen³, Mary Swift³, Michael K. Tanenhaus³ - Show less +1 more•Institutions (3)

Iowa State University¹, Arizona State University², University of Rochester³

Computational Generation of Referring Expressions

TL;DR: It is discussed how well the Fruit Carts domain meets four desired features: unscripted, context-constrained, controllable difficulty, and separability into semi-independent subdialogues.

...read moreread less

Abstract: We describe a novel domain, Fruit Carts, aimed at eliciting human language production for the twin purposes of (a) dialogue system research and development and (b) psycholinguistic research. Fruit Carts contains five tasks: choosing a cart, placing it on a map, painting the cart, rotating the cart, and filling the cart with fruit. Fruit Carts has been used for research in psycholinguistics and in dialogue systems. Based on these experiences, we discuss how well the Fruit Carts domain meets four desired features: unscripted, context-constrained, controllable difficulty, and separability into semi-independent subdialogues. We describe the domain in sufficient detail to allow others to replicate it; researchers interested in using the corpora themselves are encouraged to contact the authors directly.

...read moreread less

Reference Entry•DOI•

[...]

KrahmerEmiel, van DeemterKees¹•Institutions (1)

University of Aberdeen¹

Empirical methods for the study of denotation in nominalizations in spanish

TL;DR: The survey introduces the reg problem and describes early work in this area, discusses some computational frameworks underlying reg, and demonstrates a recent trend that seeks to link reg algorithms with well-established Knowledge Representation techniques.

...read moreread less

Journal Article•DOI•

[...]

Aina Peris¹, Mariona Taulé¹, Horacio Rodríguez²•Institutions (2)

University of Barcelona¹, Polytechnic University of Catalonia²

Quantitative Syntax Analysis Reinhard Köhler (Trier University) Berlin and Boston: De Gruyter Mouton (Quantitative Linguistics series, edited by Reinhard Köhler, Gabriel Altmann, and Peter Grzybek, volume 65), 2012, x+224 pp, hardbound, ISBN 978-3-11-027219-2, €99.95, $140.00

TL;DR: The goals of this work are to detect the most relevant features for this denotative distinction between event and result nominalizations, and to build an automatic classification system of deverbal nominalizations according to their denotation.

...read moreread less

Abstract: This article deals with deverbal nominalizations in Spanish; concretely, we focus on the denotative distinction between event and result nominalizations. The goals of this work is twofold: first, to detect the most relevant features for this denotative distinction; and, second, to build an automatic classification system of deverbal nominalizations according to their denotation. We have based our study on theoretical hypotheses dealing with this semantic distinction and we have analyzed them empirically by means of Machine Learning techniques which are the basis of the ADN-Classifier. This is the first tool that aims to automatically classify deverbal nominalizations in event, result, or underspecified denotation types in Spanish. The ADN-Classifier has helped us to quantitatively evaluate the validity of our claims regarding deverbal nominalizations. We set up a series of experiments in order to test the ADN-Classifier with different models and in different realistic scenarios depending on the knowledge resources and natural language processors available. The ADN-Classifier achieved good results 87.20% accuracy.

...read moreread less

Journal Article•DOI•

[...]

Chunshan Xu¹, Haitao Liu¹•Institutions (1)

University of Architecture, Civil Engineering and Geodesy¹

16 May 2012-Computational Linguistics

TL;DR: Quantitative Syntax Analysis is a recent work on QL by Reinhard Köhler that not only provides a comprehensive introduction to the work of QL on the syntactic level, but also sketches the theoretical grounds, the research paradigm, and the ultimate goals of quantitative linguistics in general.

...read moreread less

Abstract: Quantitative linguistics (QL) is a discipline of linguistics, that, using real texts, studies languages with quantitative mathematical approaches, aiming to precisely describe and explain, with a system of mathematical laws, the operation and development of language systems. Later in this review, we will address the relationship between QL and computational linguistics. Quantitative Syntax Analysis is a recent work on QL by Reinhard Köhler that not only provides a comprehensive introduction to the work of QL on the syntactic level, but also sketches the theoretical grounds, the research paradigm, and the ultimate goals of quantitative linguistics in general.

...read moreread less

Journal Article•DOI•

Book review: discourse.cpp o.s. le si edited by aurélie herbelot university of cambridge berlin: Peer press, 2011, 55 pp; paperbound, isbn 978-3-00-33516-7, €9.00, or free on-line at peerpress.de/discoursecpp.pdf

[...]

Lori Emerson¹•Institutions (1)

University of Colorado Boulder¹