Showing papers on "Context-free grammar published in 2010"

PDF

Open Access

Book•

[...]

25 Aug 2010

TL;DR: This book provides an extensive overview of the formal language landscape between CFG and PTIME, moving from Tree Adjoining Grammars to Multiple Context-Free grammars and then to Range Concatenation Grammar while explaining available parsing techniques for these formalisms.

...read moreread less

Abstract: Given that context-free grammars (CFG) cannot adequately describe natural languages, grammar formalisms beyond CFG that are still computationally tractable are of central interest for computational linguists. This book provides an extensive overview of the formal language landscape between CFG and PTIME, moving from Tree Adjoining Grammars to Multiple Context-Free Grammars and then to Range Concatenation Grammars while explaining available parsing techniques for these formalisms. Although familiarity with the basic notions of parsing and formal languages is helpful when reading this book, it is not a strict requirement. The presentation is supported with many illustrations and examples relating to the different formalisms and algorithms, and chapter summaries, problems and solutions. The book will be useful for students and researchers in computational linguistics and in formal language theory.

...read moreread less

134 citations

Journal Article•

Inducing Tree-Substitution Grammars

[...]

Trevor Cohn¹, Phil Blunsom², Sharon Goldwater•Institutions (2)

University of Edinburgh¹, University of Sheffield²

01 Mar 2010-Journal of Machine Learning Research

TL;DR: This work proposes a novel compromise by inferring a probabilistic tree substitution grammar, a formalism which allows for arbitrarily large tree fragments and thereby better represent complex linguistic structures and demonstrates the model's efficacy on supervised phrase-structure parsing and unsupervised dependency grammar induction.

...read moreread less

Abstract: Inducing a grammar from text has proven to be a notoriously challenging learning task despite decades of research. The primary reason for its difficulty is that in order to induce plausible grammars, the underlying model must be capable of representing the intricacies of language while also ensuring that it can be readily learned from data. The majority of existing work on grammar induction has favoured model simplicity (and thus learnability) over representational capacity by using context free grammars and first order dependency grammars, which are not sufficiently expressive to model many common linguistic constructions. We propose a novel compromise by inferring a probabilistic tree substitution grammar, a formalism which allows for arbitrarily large tree fragments and thereby better represent complex linguistic structures. To limit the model's complexity we employ a Bayesian non-parametric prior which biases the model towards a sparse grammar with shallow productions. We demonstrate the model's efficacy on supervised phrase-structure parsing, where we induce a latent segmentation of the training treebank, and on unsupervised dependency grammar induction. In both cases the model uncovers interesting latent linguistic structures while producing competitive results.

...read moreread less

98 citations

Proceedings Article•

Authorship Attribution Using Probabilistic Context-Free Grammars

[...]

Sindhu Raghavan¹, Adriana Kovashka¹, Raymond J. Mooney¹•Institutions (1)

University of Texas at Austin¹

11 Jul 2010

TL;DR: A novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars, and uses this grammar as a language model for classification.

...read moreread less

Abstract: In this paper, we present a novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars. Our approach involves building a probabilistic context-free grammar for each author and using this grammar as a language model for classification. We evaluate the performance of our method on a wide range of datasets to demonstrate its efficacy.

...read moreread less

88 citations

Proceedings Article•DOI•

Semantics and algorithms for data-dependent grammars

[...]

Trevor Jim¹, Yitzhak Mandelbaum¹, David Walker²•Institutions (2)

AT&T Labs¹, Princeton University²

17 Jan 2010

TL;DR: The design and theory of a new parsing engine, YAKKER, capable of satisfying the many needs of modern programmers and modern data processing applications is presented and its use on examples ranging from difficult programming language grammars to web server logs to binary data specification is illustrated.

...read moreread less

Abstract: We present the design and theory of a new parsing engine, YAKKER, capable of satisfying the many needs of modern programmers and modern data processing applications. In particular, our new parsing engine handles (1) full scannerless context-free grammars with (2) regular expressions as right-hand sides for defining nonterminals. YAKKER also includes (3) facilities for binding variables to intermediate parse results and (4) using such bindings within arbitrary constraints to control parsing. These facilities allow the kind of data-dependent parsing commonly needed in systems applications, particularly those that operate over binary data. In addition, (5) nonterminals may be parameterized by arbitrary values, which gives the system good modularity and abstraction properties in the presence of data-dependent parsing. Finally, (6) legacy parsing libraries,such as sophisticated libraries for dates and times, may be directly incorporated into parser specifications. We illustrate the importance and utility of this rich collection of features by presenting its use on examples ranging from difficult programming language grammars to web server logs to binary data specification. We also show that our grammars have important compositionality properties and explain why such properties areimportant in modern applications such as automatic grammar induction.In terms of technical contributions, we provide a traditional high-level semantics for our new grammar formalization and show how to compile grammars into non deterministic automata. These automata are stack-based, somewhat like conventional push-down automata,but are also equipped with environments to track data-dependent parsing state. We prove the correctness of our translation of data-dependent grammars into these new automata and then show how to implement the automata efficiently using a variation of Earley's parsing algorithm.

...read moreread less

61 citations

DOI•

Practical Dynamic Grammars for Dynamic Languages

[...]

Lukas Renggli, Stéphane Ducasse, Tudor Gîrba, Oscar Nierstrasz

01 Jan 2010

TL;DR: PetitParser combines ideas from scannerless parsing, parser combinators, parsing expression grammars and packrat parsers to model grammar and parsers as objects that can be reconfigured dynamically.

...read moreread less

Abstract: Grammars for programming languages are traditionally specified statically. They are hard to compose and reuse due to ambiguities that inevitably arise. PetitParser combines ideas from scannerless parsing, parser combinators, parsing expression grammars and packrat parsers to model grammars and parsers as objects that can be reconfigured dynamically. Through examples and benchmarks we demonstrate that dynamic grammars are not only flexible but highly practical.

...read moreread less

61 citations

Book Chapter•DOI•

Distributional learning of some context-free languages with a minimally adequate teacher

[...]

Alexander Clark¹•Institutions (1)

Royal Holloway, University of London¹

13 Sep 2010

TL;DR: It is shown that there is a natural class of context free languages, that includes the class of regular Languages, that can be polynomially learned from a MAT, using an algorithm that is an extension of Angluin's LSTAR algorithm.

...read moreread less

Abstract: Angluin showed that the class of regular languages could be learned from a Minimally Adequate Teacher (mat) providing membership and equivalence queries. Clark and Eyraud (2007) showed that some context free grammars can be identified in the limit from positive data alone by identifying the congruence classes of the language. In this paper we consider learnability of context free languages using a MAT. We show that there is a natural class of context free languages, that includes the class of regular languages, that can be polynomially learned from a MAT, using an algorithm that is an extension of Angluin's LSTAR algorithm.

...read moreread less

54 citations

Journal Article•DOI•

Specifying Rewrite Strategies for Interactive Exercises

[...]

Bastiaan Heeren¹, Johan Jeuring¹, Johan Jeuring², Alex Gerdes¹•Institutions (2)

Open University in the Netherlands¹, Utrecht University²

10 Mar 2010-Mathematics in Computer Science

TL;DR: A language for specifying strategies for solving exercises is introduced, which makes it easier to automatically calculate feedback, for example when a user makes an erroneous step in a calculation.

...read moreread less

Abstract: Strategies specify how a wide range of exercises can be solved incrementally, such as bringing a logic proposition to disjunctive normal form, reducing a matrix, or calculating with fractions. In this paper we introduce a language for specifying strategies for solving exercises. This language makes it easier to automatically calculate feedback, for example when a user makes an erroneous step in a calculation. We can automatically generate worked-out examples, track the progress of a student by inspecting submitted intermediate answers, and report back suggestions in case the student deviates from the strategy. Thus it becomes less labor-intensive and less ad-hoc to specify new exercise domains and exercises within that domain. A strategy describes valid sequences of rewrite rules, which turns tracking intermediate steps into a parsing problem. This is a promising view at interactive exercises because it allows us to take advantage of many years of experience in parsing sentences of context-free languages, and transfer this knowledge and technology to the domain of stepwise solving exercises. In this paper we work out the similarities between parsing and solving exercises incrementally, we discuss generating feedback on strategies, and the implementation of a strategy recognizer.

...read moreread less

46 citations

Proceedings Article•DOI•

Packrat parsers can handle practical grammars in mostly constant space

[...]

Kota Mizushima¹, Atusi Maeda¹, Yoshinori Yamaguchi¹•Institutions (1)

University of Tsukuba¹

06 May 2010

TL;DR: Methods to automatically insert cut operators into some practical grammars without changing the accepted languages are proposed, which can handle some practical Grammars including the Java grammar in mostly constant space without requiring any extra annotations.

...read moreread less

Abstract: Packrat parsing is a powerful parsing algorithm presented by Ford in 2002. Packrat parsers can handle complicated grammars and recursive structures in lexical elements more easily than the traditional LL(k) or LR(1) parsing algorithms. However, packrat parsers require O(n) space for memoization, where n is the length of the input. This space inefficiency makes packrat parsers impractical in some applications. In our earlier work, we had proposed a packrat parser generator that accepts grammars extended with cut operators, which enable the generated parsers to reduce the amount of storage required.Experiments showed that parsers generated from cut-inserted grammars can parse Java programs and subset XML files in bounded space.In this study, we propose methods to automatically insert cut operators into some practical grammars without changing the accepted languages. Our experimental evaluations indicated that using our methods, packrat parsers can handle some practical grammars including the Java grammar in mostly constant space without requiring any extra annotations.

...read moreread less

37 citations

Book Chapter•DOI•

Context-free grammars

[...]

Alexander Shen¹•Institutions (1)

University of Provence¹

01 Jan 2010

TL;DR: This chapter considers faster algorithms that can be used for some classes of context-free grammars, called recursive-descent parsing and LL-parsing, both of which are not fast enough to be practical.

...read moreread less

Abstract: In chapter 5 we use finite automata for text parsing. As noted, there are rather simple structures (e.g., nested comments) that cannot be parsed with finite automata. There is a more powerful formalism called context-free grammars that is often used when finite automata are not enough. In section 15.1 we define context-free grammars and consider a general polynomial parsing algorithm. However, this algorithm is not fast enough to be practical, and in the next two sections we consider faster (linear time) algorithms that can be used for some classes of context-free grammars, called recursive-descent parsing (section 15.2) and LL-parsing (section 15.3).

...read moreread less

36 citations

Journal Article•DOI•

Analyzing ambiguity of context-free grammars

[...]

Claus Brabrand¹, Robert Giegerich², Anders Møller³•Institutions (3)

International Telecommunication Union¹, Bielefeld University², Aarhus University³

01 Mar 2010-Science of Computer Programming

TL;DR: It is observed that there is a simple linguistic characterization of the grammar ambiguity problem, and it is shown how to exploit this by presenting an ambiguity analysis framework based on conservative language approximations.

...read moreread less

36 citations

Journal Article•DOI•

PGF: A Portable Run-time Format for Type-theoretical Grammars

[...]

Krasimir Angelov¹, Björn Bringert¹, Aarne Ranta¹•Institutions (1)

Chalmers University of Technology¹

01 Apr 2010-Journal of Logic, Language and Information

TL;DR: This paper gives a concise description of PGF, covering syntax, semantics, and parser generation, and discusses the technique of embedded Grammatical Framework, where language processing tasks defined by PGF grammars are integrated in larger systems.

...read moreread less

Abstract: Portable Grammar Format (PGF) is a core language for type-theoretical grammars. It is the target language to which grammars written in the high-level formalism Grammatical Framework (GF) are compiled. Low-level and simple, PGF is easy to reason about, so that its language-theoretic properties can be established. It is also easy to write interpreters that perform parsing and generation with PGF grammars, and compilers converting PGF to other formats. This paper gives a concise description of PGF, covering syntax, semantics, and parser generation. It also discusses the technique of embedded grammars, where language processing tasks defined by PGF grammars are integrated in larger systems.

...read moreread less

Book Chapter•DOI•

Learning context free grammars with the syntactic concept lattice

[...]

Alexander Clark¹•Institutions (1)

Royal Holloway, University of London¹

13 Sep 2010

TL;DR: This work presents a learning algorithm for context free grammars which uses positive data and membership queries, and proves its correctness under the identification in the limit paradigm.

...read moreread less

Abstract: The Syntactic Concept Lattice is a residuated lattice based on the distributional structure of a language; the natural representation based on this is a context sensitive formalism. Here we examine the possibility of basing a context free grammar (CFG) on the structure of this lattice; in particular by choosing non-terminals to correspond to concepts in this lattice. We present a learning algorithm for context free grammars which uses positive data and membership queries, and prove its correctness under the identification in the limit paradigm. Since the lattice itself may be infinite, we consider only a polynomially bounded subset of the set of concepts, in order to get an efficient algorithm. We compare this on the one hand to learning algorithms for context free grammars, where the non-terminals correspond to congruence classes, and on the other hand to the use of context sensitive techniques such as Binary Feature Grammars and Distributional Lattice Grammars. The class of CFGs that can be learned in this way includes inherently ambiguous and thus non-deterministic languages; this approach therefore breaks through an important barrier in CFG inference.

...read moreread less

Journal Article•DOI•

On automata and language based grammar metrics

[...]

Matej Črepinšek¹, Tomaz Kosar¹, Marjan Mernik¹, Julien Cervelle², Rémi Forax², Gilles Roussel² - Show less +2 more•Institutions (2)

University of Maribor¹, University of Paris²

01 Jan 2010-Computer Science and Information Systems

TL;DR: The aim of this paper is to experiment, on several grammars of domain specific languages and of general-purpose languages, existing grammar metrics together with the new metrics that are based on grammar LR automaton and on the language recognized.

...read moreread less

Abstract: Grammar metrics have been introduced to measure the quality and the complexity of the formal grammars. The aim of this paper is to explore the meaning of these notions and to experiment, on several grammars of domain specific languages and of general-purpose languages, existing grammar metrics together with the new metrics that are based on grammar LR automaton and on the language recognized. We discuss the results of this experiment and focus on the comparison between grammars of domain specific languages as well as of general-purpose languages and on the evolution of the metrics between several versions of the same language.

...read moreread less

Proceedings Article•

Variational Inference for Adaptor Grammars

[...]

Shay B. Cohen¹, David M. Blei², Noah A. Smith¹•Institutions (2)

Carnegie Mellon University¹, Princeton University²

02 Jun 2010

TL;DR: A variational inference algorithm for adaptor grammars is described, providing an alternative to Markov chain Monte Carlo methods, and a significant speed-up is shown when parallelizing the algorithm.

...read moreread less

Abstract: Adaptor grammars extend probabilistic context-free grammars to define prior distributions over trees with "rich get richer" dynamics. Inference for adaptor grammars seeks to find parse trees for raw text. This paper describes a variational inference algorithm for adaptor grammars, providing an alternative to Markov chain Monte Carlo methods. To derive this method, we develop a stick-breaking representation of adaptor grammars, a representation that enables us to define adaptor grammars with recursion. We report experimental results on a word segmentation task, showing that variational inference performs comparably to MCMC. Further, we show a significant speed-up when parallelizing the algorithm. Finally, we report promising results for a new application for adaptor grammars, dependency grammar induction.

...read moreread less

An Exploration of Grammars in Grammatical Evolution

[...]

Erik Anders, Pieter Hemberg

01 Jan 2010

TL;DR: The grammar in the grammar-based Genetic Programming (GP) approach of Grammatical Evolution (GE) is explored and a meta-grammar GE is studied, which allows a larger grammar with different bias, by adopting a divide-and-conquer strategy.

...read moreread less

Abstract: The grammar in the grammar-based Genetic Programming (GP) approach of Grammatical Evolution (GE) is explored. The GE algorithm solves problems by using a grammar representation and an automated and parallel trial-and-error approach, Evolutionary Computation (EC). The search for solutions in EC is driven by evaluating each solution, selecting the fittest and replacing these into a population of solutions which are modified to further guide the search. Representations have a strong impact on the efficiency of search and by using a generative grammar domain knowledge is encoded into the population of solutions. The grammar in GE biases the search for solutions, and in combination with a linear representation this is what distinguishes GE from other GP-systems. After a review of grammars in EC and a description of GE, several different constructions of grammars and operators for manipulating the grammars and the evolutionary algorithm are studied. The thesis goes on to study a meta-grammar GE, which allows a larger grammar with different bias. By adopting a divide-and-conquer strategy the goal is to investigate how a modular GE approach solves problems of increasing size and in dynamically changing environments. The results show some benefit from using meta-grammars in GE, for the meta-grammar Genetic Algorithm (mGGA) and they re-emphasize the grammar’s impact on GE’s performance. In addition, GE and meta-grammars are more formally described. The bias, both declarative and search, arising from the use of a Context-Free Grammar representation and the constraints of GE and the mGGA are analyzed and their implications are examined. This is done by studying the effects of the mapping and operations on the input, single and multiple changes in input, as well as the preservation of output after a change. Furthermore, a matrix view of a grammar and different suggestions for measurements of grammars are investigated, in order to allow the practitioner to get an alternative view of the mapping process and of how operations work.

...read moreread less

Proceedings Article•DOI•

A context free grammar and its predictive parser for bangla grammar recognition

[...]

K. M. Azharul Hasan¹, Amit Mondal¹, Amit Saha¹•Institutions (1)

Khulna University of Engineering & Technology¹

01 Dec 2010

TL;DR: This paper describes a context free grammar for Bangla language and hence a Bangla parser based on the grammar is developed, based on Top down parsing method and to avoid the left recursion the idea of left factoring is adopted.

...read moreread less

Abstract: Parsing is a process of transforming natural language into an internal system representation, which can be trees, dependency graphs, frames or some other structural representations. If a natural language be successfully parsed then grammar checking from this language becomes easy. In this paper we describe a context free grammar for Bangla language and hence we develop a Bangla parser based on the grammar. Our approach is very much general to apply in Bangla Sentences and the method is well accepted for parsing a language of a grammar. The scheme is based on Top down parsing method and to avoid the left recursion the idea of left factoring is adopted.

...read moreread less

Journal Article•DOI•

Second-Order Abstract Categorial Grammars as Hyperedge Replacement Grammars

[...]

Makoto Kanazawa¹•Institutions (1)

National Institute of Informatics¹

01 Apr 2010-Journal of Logic, Language and Information

TL;DR: A simple, direct proof of the fact that second-order ACGs are simulated by hyperedge replacement grammars is given, which implies that the string and tree generating power of the former is included in that of the latter.

...read moreread less

Abstract: Second-order abstract categorial grammars (de Groote in Association for computational linguistics, 39th annual meeting and 10th conference of the European chapter, proceedings of the conference, pp. 148---155, 2001) and hyperedge replacement grammars (Bauderon and Courcelle in Math Syst Theory 20:83---127, 1987; Habel and Kreowski in STACS 87: 4th Annual symposium on theoretical aspects of computer science. Lecture notes in computer science, vol 247, Springer, Berlin, pp 207---219, 1987) are two natural ways of generalizing "context-free" grammar formalisms for string and tree languages. It is known that the string generating power of both formalisms is equivalent to (non-erasing) multiple context-free grammars (Seki et al. in Theor Comput Sci 88:191---229, 1991) or linear context-free rewriting systems (Weir in Characterizing mildly context-sensitive grammar formalisms, University of Pennsylvania, 1988). In this paper, we give a simple, direct proof of the fact that second-order ACGs are simulated by hyperedge replacement grammars, which implies that the string and tree generating power of the former is included in that of the latter. The normal form for tree-generating hyperedge replacement grammars given by Engelfriet and Maneth (Graph transformation. Lecture notes in computer science, vol 1764. Springer, Berlin, pp 15---29, 2000) can then be used to show that the tree generating power of second-order ACGs is exactly the same as that of hyperedge replacement grammars.

...read moreread less

Book Chapter•DOI•

Graph-Based evolution of visual languages

[...]

Penousal Machado¹, Henrique Nunes¹, Juan Romero²•Institutions (2)

University of Coimbra¹, University of A Coruña²

07 Apr 2010

TL;DR: A novel evolutionary engine for the evolution of context free grammars that relies on specially designed graph-based crossover and mutation operators and is able to create diverse and interesting families of shapes even when the initial population is composed of minimal Grammars.

...read moreread less

Abstract: We present a novel evolutionary engine for the evolution of context free grammars. The system relies on specially designed graph-based crossover and mutation operators. While in most evolutionary art systems each individual corresponds to a single artwork, in our approach each individual is a context free grammar that specifies a family of shapes following the same production rules. To assess the adequacy and completeness of the system we perform experiments using automated fitness assignment and user-guided evolution. The experimental results show that the system is able to create diverse and interesting families of shapes even when the initial population is composed of minimal grammars.

...read moreread less

Journal Article•DOI•

Adaptive star grammars and their languages

[...]

Frank Drewes¹, Berthold Hoffmann², Dirk Janssens³, Mark Minas⁴•Institutions (4)

Umeå University¹, University of Bremen², University of Antwerp³, Bundeswehr University Munich⁴

01 Jul 2010-Theoretical Computer Science

TL;DR: In adaptive star grammars, rules are actually schemata which, via the cloning of so-called multiple nodes, may adapt to potentially infinitely many contexts when they are applied, and they turn out to be restricted enough to share some of the basic characteristics of context-free devices.

...read moreread less

Proceedings Article•DOI•

Tree-adjunct grammatical evolution

[...]

Eoin Murphy¹, Michael O'Neill¹, Edgar Galván-López¹, Anthony Brabazon¹•Institutions (1)

University College Dublin¹

27 Sep 2010

TL;DR: An analytic comparison of the performance of both setups, i.e., grammatical evolution and tree-adjunctgrammatical evolution, across a number of classic genetic programming benchmarking problems indicate that tree- adjunct grammars has a better overall performance (measured in terms of finding the global optima).

...read moreread less

Abstract: In this paper we investigate the application of tree-adjunct grammars to grammatical evolution. The standard type of grammar used by grammatical evolution, context-free grammars, produce a subset of the languages that tree-adjunct grammars can produce, making tree-adjunct grammars, expressively, more powerful. In this study we shed some light on the effects of tree-adjunct grammars on grammatical evolution, or tree-adjunct grammatical evolution. We perform an analytic comparison of the performance of both setups, i.e., grammatical evolution and tree-adjunct grammatical evolution, across a number of classic genetic programming benchmarking problems. The results firmly indicate that tree-adjunct grammatical evolution has a better overall performance (measured in terms of finding the global optima).

...read moreread less

Journal Article•DOI•

Conjunctive grammars with restricted disjunction

[...]

Alexander Okhotin¹, Christian Reitwieβner²•Institutions (2)

University of Turku¹, University of Würzburg²

01 Jun 2010-Theoretical Computer Science

TL;DR: If it is furthermore required that each rule of the general form A->w has a nonempty w, then a substantial subfamily of conjunctive languages can be generated, yet it remains unknown whether such grammars are as powerful as conj unctive grammARS of thegeneral form.

...read moreread less

Book Chapter•DOI•

On erasing productions in random context grammars

[...]

Georg Zetzsche¹•Institutions (1)

University of Hamburg¹

06 Jul 2010

TL;DR: Three open questions in the theory of regulated rewriting are addressed, including whether every permitting random context grammar has a non-erasing equivalent and whether permitting random Context Grammars have the same generative capacity as matrix grammars without appearance checking.

...read moreread less

Abstract: Three open questions in the theory of regulated rewriting are addressed. The first is whether every permitting random context grammar has a non-erasing equivalent. The second asks whether the same is true for matrix grammars without appearance checking. The third concerns whether permitting random context grammars have the same generative capacity as matrix grammars without appearance checking. The main result is a positive answer to the first question. For the other two, conjectures are presented. It is then deduced from the main result that at least one of the two holds.

...read moreread less

Book Chapter•DOI•

Tracking down the origins of ambiguity in context-free grammars

[...]

H. J. S. Basten¹•Institutions (1)

Centrum Wiskunde & Informatica¹

01 Sep 2010

TL;DR: In this paper, the approximative noncanonical unambiguity test by Schmitz can be extended to conservatively identify production rules that do not contribute to the ambiguity of a grammar.

...read moreread less

Abstract: Context-free grammars are widely used but still hindered by ambiguity. This stresses the need for detailed detection methods that point out the sources of ambiguity in a grammar. In this paper we show how the approximative Noncanonical Unambiguity Test by Schmitz can be extended to conservatively identify production rules that do not contribute to the ambiguity of a grammar. We prove the correctness of our approach and consider its practical applicability.

...read moreread less

Proceedings Article•DOI•

A Formal Framework for Mutation Testing

[...]

Fevzi Belli¹, Mutlu Beyazit¹•Institutions (1)

University of Paderborn¹

09 Jun 2010

TL;DR: A grammar-based mutation testing framework is proposed, together with effective mutation operators, coverage concepts and algorithms for test sequence generation, which enables complementary or alternative use of regular grammars, depending on the preferences of the test engineer.

...read moreread less

Abstract: Model-based approaches, especially based on directed graphs (DG), are becoming popular for mutation testing as they enable definition of simple, nevertheless powerful, mutation operators and effective coverage criteria. However, these models easily become intractable if the system under consideration is too complex or large. Moreover, existing DG-based algorithms for test generation and optimization are rare and rather in an initial stage. Finally, DG models fail to represent languages beyond type-3 (regular). This paper proposes a grammar-based mutation testing framework, together with effective mutation operators, coverage concepts and algorithms for test sequence generation. The objective is to establish a formal framework for model-based mutation testing which enables complementary or alternative use of regular grammars, depending on the preferences of the test engineer. A case study validates the approach and analyzes its characteristic issues.

...read moreread less

Direct left-recursive parsing expression grammars

[...]

Laurence Tratt

01 Oct 2010

TL;DR: This paper shows how the approach proposed for direct left-recursive Packrat parsing by Warth et al. can be adapted for ‘pure’ PEGs, and outlines a restrictive subset of left- Recursion Grammars which can safely work with this algorithm.

...read moreread less

Abstract: Parsing Expression Grammars (PEGs) are specifications of unambiguous recursive-descent style parsers. PEGs incorporate both lexing and parsing phases and have valuable properties, such as being closed under composition. In common with most recursive-descent systems, raw PEGs cannot handle left-recursion; traditional approaches to left-recursion elimination lead to incorrect parses. In this paper, I show how the approach proposed for direct left-recursive Packrat parsing by Warth et al. can be adapted for ‘pure’ PEGs. I then demonstrate that this approach results in incorrect parses for some PEGs, before outlining a restrictive subset of left-recursive PEGs which can safely work with this algorithm. Finally I suggest an alteration to Warth et al.’s algorithm that can correctly parse a less restrictive subset of directly recursive PEGs.

...read moreread less

Proceedings Article•

Blocked Inference in Bayesian Tree Substitution Grammars

[...]

Trevor Cohn¹, Phil Blunsom²•Institutions (2)

University of Sheffield¹, University of Oxford²

11 Jul 2010

TL;DR: A novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method's local Gibbs sampler, which enables efficient blocked inference for training and also improves the parsing algorithm.

...read moreread less

Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method's local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently converges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.

...read moreread less

Journal Article•DOI•

VisualLISA: a visual environment to develop attribute grammars

[...]

Nuno Oliveira¹, Maria João Varanda Pereira¹, Pedro Rangel Henriques¹, Daniela da Cruz, Bastian Cramer² - Show less +1 more•Institutions (2)

University of Minho¹, University of Paderborn²

01 Jan 2010-Computer Science and Information Systems

TL;DR: This paper presents a solution for rapid development of VisualLISA editor using DEViL, a new visual language for attribute grammars (AGs), and on the development of the associated programming environment.

...read moreread less

Abstract: The focus of this paper is on crafting a new visual language for attribute grammars (AGs), and on the development of the associated programming environment. We present a solution for rapid development of VisualLISA editor using DEViL. DEViL uses traditional attribute grammars, to specify the language's syntax and semantics, extended by visual representations to be associated with grammar symbols. From these specifications a visual programming environment is automatically generated. In our case, the environment allows us to edit a visual description of an AG that is automatically translated into textual notations, including an XML-based representation for attribute grammars (XAGra), and is intended to be helpful for beginners and rapid development of small AGs. XAGra allows us to use VisualLISA with other compiler-compiler tools.

...read moreread less

Proceedings Article•DOI•

A Toolkit for Generating Sentences from Context-Free Grammars

[...]

Zhiwu Xu¹, Lixiao Zheng¹, Haiming Chen¹•Institutions (1)

Chinese Academy of Sciences¹

13 Sep 2010

TL;DR: In this article, the authors present a toolkit for context-free grammars, which mainly consists of several algorithms for sentence generation or enumeration and for coverage analysis for context free grammar.

...read moreread less

Abstract: Producing sentences from a grammar, according to various criteria, is required in many applications. It is also a basic building block for grammar engineering. This paper presents a toolkit for context-free grammars, which mainly consists of several algorithms for sentence generation or enumeration and for coverage analysis for context-free grammars. The toolkit deals with general context-free grammars. Besides providing implementations of algorithms, the toolkit also provides a simple graphical user interface, through which the user can use the toolkit directly. The toolkit is implemented in Java and is available at http://lcs.ios.ac.cn/ hiwu/toolkit.php. In the paper, the overview of the toolkit and the description of the GUI are presented, and experimental results and preliminary applications of the toolkit are also contained.

...read moreread less

Proceedings Article•DOI•

Querying parse trees of stochastic context-free grammars

[...]

Sara Cohen¹, Benny Kimelfeld²•Institutions (2)

Hebrew University of Jerusalem¹, IBM²

23 Mar 2010

TL;DR: It is shown that query results have rational probabilities with a polynomial-size bit representation and, more importantly, an efficient query-evaluation algorithm is presented.

...read moreread less

Abstract: Stochastic context-free grammars (SCFGs) have long been recognized as useful for a large variety of tasks including natural language processing, morphological parsing, speech recognition, information extraction, Web-page wrapping and even analysis of RNA. A string and an SCFG jointly represent a probabilistic interpretation of the meaning of the string, in the form of a (possibly infinite) probability space of parse trees. The problem of evaluating a query over this probability space is considered under the conventional semantics of querying a probabilistic database. For general SCFGs, extremely simple queries may have results that include irrational probabilities. But, for a large subclass of SCFGs (that includes all the standard studied subclasses of SCFGs) and the language of tree-pattern queries with projection (and child/descendant edges), it is shown that query results have rational probabilities with a polynomial-size bit representation and, more importantly, an efficient query-evaluation algorithm is presented.

...read moreread less

Journal Article•DOI•

Defining Models - Meta Models versus Graph Grammars

[...]

Berthold Hoffmann¹, Mark Minas²•Institutions (2)

University of Bremen¹, Bundeswehr University Munich²

22 Jul 2010-Electronic Communication of The European Association of Software Science and Technology

TL;DR: In this paper, contextual star grammars are proposed as a graph grammar approach that allows for simple parsing and that is powerful enough for specifying non-trivial software models, such as program graphs, a language-independent model of object-oriented programs.

...read moreread less

Abstract: The precise specification of software models is a major concern in model-driven design of object-oriented software. Metamodelling and graph grammars are apparent choices for such specifications. Metamodelling has several advantages: it is easy to use, and provides procedures that check automatically whether a model is valid or not. However, it is less suited for proving properties of models, or for generating large sets of example models. Graph grammars, in contrast, offer a natural procedure - the derivation process - for generating example models, and they support proofs because they define a graph language inductively. However, not all graph grammars that allow to specify practically relevant models are easily parseable. In this paper, we propose contextual star grammars as a graph grammar approach that allows for simple parsing and that is powerful enough for specifying non-trivial software models. This is demonstrated by defining program graphs, a language-independent model of object-oriented programs, with a focus on shape (static structure) rather than behavior.

...read moreread less