scispace - formally typeset
Search or ask a question

Showing papers on "Tree-adjoining grammar published in 2011"


Journal ArticleDOI
TL;DR: The equivalence of tree languages and monadic linear context-free grammars was shown by as discussed by the authors, who showed that a tree language is a member of this class iff it is the two-dimensional yield of an MSO-definable three-dimensional tree language.
Abstract: The equivalence of leaf languages of tree adjoining grammars and monadic linear context-free grammars was shown about a decade ago. This paper presents a proof of the strong equivalence of these grammar formalisms. Non-strict tree adjoining grammars and monadic linear context-free grammars define the same class of tree languages. We also present a logical characterisation of this tree language class showing that a tree language is a member of this class iff it is the two-dimensional yield of an MSO-definable three-dimensional tree language.

35 citations


Journal ArticleDOI
TL;DR: This work describes a refined method for grammar convergence, and it uses it in a major study, where it is used to recover the relationships between all the grammars that occur in the different versions of the Java Language Specification.
Abstract: Grammar convergence is a method that helps in discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent.

35 citations


Journal ArticleDOI
TL;DR: A grammatical formalism, called DepPattern, to write dependency grammars using patterns of Part of Speech tags augmented with lexical and morphological information, which inherits ideas from Sinclair's work and Pattern Grammar is described.
Abstract: In this paper, we describe a grammatical formalism, called DepPattern, to write dependency grammars using patterns of Part of Speech (PoS) tags augmented with lexical and morphological information. The formalism inherits ideas from Sinclair’s work and Pattern Grammar. To properly analyze semi-fixed idiomatic expressions, DepPattern distinguishes between open-choice and idiomatic rules. A grammar is defined as a set of lexical-syntactic rules at different levels of abstraction. In addition, a compiler was implemented so as to generate deterministic and robust parsers from DepPattern grammars. These parsers identify dependencies which can be used to improve corpus-based applications such as information extraction. At the end of this article, we describe an experiment which evaluates the efficiency of a dependency parser generated from a simple DepPattern grammar. In particular, we evaluated the precision of a semantic extraction method making use of a DepPattern-based parser.

34 citations


Journal ArticleDOI
TL;DR: This approach puts the creation and use of 3-D spatial grammars on a more general level and supports designers with facilitated definition and application of their own rules in a familiar computer-aided design environment without requiring programming.
Abstract: Spatial grammars are rule based, generative systems for the specification of formal languages. Set and shape grammar formulations of spatial grammars enable the definition of spatial design languages and the creation of alternative designs. Since the introduction of the underlying formalism, they have been successfully applied to different domains including visual arts, architecture, and engineering. Although many spatial grammars exist on paper, only a few, limited spatial grammar systems have been computationally implemented to date; this is especially true for three-dimensional (3-D) systems. Most spatial grammars are hard-coded, that is, once implemented, the vocabulary and rules cannot be changed without reprogramming. This article presents a new approach and prototype implementation for a 3-D spatial grammar interpreter that enables interactive, visual development and application of grammar rules. The method is based on a set grammar that uses a set of parameterized primitives and includes the definition of nonparametric and parametric rules, as well as their automatic application. A method for the automatic matching of the left hand side of a rule in a current working shape, including defining parametric relations, is outlined. A prototype implementation is presented and used to illustrate the approach through three examples: the "kindergarten grammar," vehicle wheel rims, and cylinder cooling fins. This approach puts the creation and use of 3-D spatial grammars on a more general level and supports designers with facilitated definition and application of their own rules in a familiar computer-aided design environment without requiring programming.

34 citations


Dissertation
15 Feb 2011
TL;DR: It is proved that the number of smallest grammars can be exponential in the size of the sequence and then analysed the stability of the discovered structures between minimal Grammar Parsing for real-life examples.
Abstract: Motivated by the goal of discovering hierarchical structures inside DNA sequences, we address the Smallest Grammar Problem, the problem of finding a smallest context-free grammar that generates exactly one sequence. This NP-Hard problem has been widely studied for applications like Data Compression, Structure Discovery and Algorithmic Information Theory. From the theoretical point of view, our contributions to this problem is a new formalisation of the Smallest Grammar Problem based on two complementary optimisation problems: the choice of constituents of the final grammar and the choice of how to parse the sequence with these constituents. We give a polynomial time solution for this last problem, which me named the ''Minimal Grammar Parsing" problem. This decomposition allows us to define a new complete and correct search space for the Smallest Grammar Problem. Based on this search space, we propose new algorithms able to return grammars 10\% smaller than the state of the art on complete genomes. Regarding efficiency, we study different equivalence classes of repeats and introduce an efficient in-place schema to update the suffix array data structure used to compute these words. We conclude this thesis analysing the applications. For Structure Discovery, we consider the impact of the non-uniqueness of smallest grammars. We prove that the number of smallest grammars can be exponential in the size of the sequence and then analyse the stability of the discovered structures between minimal grammars for real-life examples. With respect to Data Compression, we extend our algorithms to use rigid patterns as words and achieve compression rate up to 25\% better compared to the previous best DNA grammar-based coder.

33 citations


Book ChapterDOI
26 May 2011
TL;DR: A normal form for hyperedge replacement Grammars is introduced as a generalisation of the Greibach Normal Form for string grammars and the adapted construction to support the required concretisations.
Abstract: Heap-based data structures play an important role in modern programming concepts. However standard verification algorithms cannot cope with infinite state spaces as induced by these structures. A common approach to solve this problem is to apply abstraction techniques. Hyperedge replacement grammars provide a promising technique for heap abstraction as their production rules can be used to partially abstract and concretise heap structures. To support the required concretisations, we introduce a normal form for hyperedge replacement grammars as a generalisation of the Greibach Normal Form for string grammars and the adapted construction.

31 citations


Book ChapterDOI
03 Jul 2011
TL;DR: An automated approach is developed that is practically useful in revealing evidence of nonequivalence of grammars and discovering correspondence mappings for grammar nonterminals and two studies are discussed that show how the approach is used in comparing Grammars of open source Java parsers as well as grammARS from the course work for a compiler construction class.
Abstract: There exist a number of software engineering scenarios that essentially involve equivalence or correspondence assertions for some of the context-free grammars in the scenarios. For instance, when applying grammar transformations during parser development--be it for the sake of disambiguation or grammar-class compliance--one would like to preserve the generated language. Even though equivalence is generally undecidable for context-free grammars, we have developed an automated approach that is practically useful in revealing evidence of nonequivalence of grammars and discovering correspondence mappings for grammar nonterminals. Our approach is based on systematic test data generation and parsing. We discuss two studies that show how the approach is used in comparing grammars of open source Java parsers as well as grammars from the course work for a compiler construction class.

30 citations


Journal ArticleDOI
TL;DR: In this paper, the authors focus on a simple type of tiling, named regional, and define the corresponding regional tile grammars, which can be unified and extended using an approach, whereby the right part of a rule is formalized by means of a finite set of permitted tiles.
Abstract: Several old and recent classes of picture grammars, that variously extend context-free string grammars in two dimensions, are based on rules that rewrite arrays of pixels. Such grammars can be unified and extended using an approach, whereby the right part of a rule is formalized by means of a finite set of permitted tiles. We focus on a simple type of tiling, named regional, and define the corresponding regional tile grammars. They include both Siromoneyʼs (or Matzʼs) Kolam grammars and their generalization by Průsa, as well as Drewesʼs grid grammars. Regionally defined pictures can be recognized with polynomial-time complexity by an algorithm extending the CKY one for strings. Regional tile grammars and languages are strictly included into our previous tile grammars and languages, and are incomparable with Giammarresi–Restivo tiling systems (or Wang systems).

29 citations


Book ChapterDOI
01 Jan 2011
TL;DR: A general model for various mechanisms of regulated rewriting based on the applicability of rules is introduced, especially graph-controlled, programmed, matrix, random context, and ordered grammars as well as some basic variants of grammar systems.
Abstract: We introduce a general model for various mechanisms of regulated rewriting based on the applicability of rules, especially we consider graph-controlled, programmed, matrix, random context, and ordered grammars as well as some basic variants of grammar systems. Most of the general relations between graph-controlled grammars, matrix grammars, random-context grammars, and ordered grammars established in this paper are independent from the objects and the kind of rules and only based on the notion of applicability of rules within the different regulating mechanisms and their specific structure in allowing sequences of rules to be applied. For example, graph-controlled grammars are always at least as powerful as programmed and matrix grammars. For the simulation of random context and ordered grammars by matrix and graph-controlled grammars, some specific requirements have to be fulfilled by the types of rules.

23 citations


Book ChapterDOI
05 Oct 2011
TL;DR: This paper demonstrates how existing distributional learning techniques for context-free grammars can be adapted to simple context- free tree Grammars in a straightforward manner once the necessary notions and properties for string languages have been redefined for trees.
Abstract: This paper demonstrates how existing distributional learning techniques for context-free grammars can be adapted to simple context-free tree grammars in a straightforward manner once the necessary notions and properties for string languages have been redefined for trees. Distributional learning is based on the decomposition of an object into a substructure and the remaining structure, and on their interrelations. A corresponding learning algorithm can emulate those relations in order to determine a correct grammar for the target language.

20 citations


Journal ArticleDOI
TL;DR: The compressed membership problem for one-nonterminal conjunctive grammars over {a} is proved to be EXPTIME-complete; the same problem for the context-free grammar is decidable in NLOGSPACE, but becomes NP-complete if the grammar is compressed as well.
Abstract: Conjunctive grammars over an alphabet Σ={a} are studied, with the focus on the special case with a unique nonterminal symbol. Such a grammar is equivalent to an equation X=ϕ(X) over sets of natural numbers, using union, intersection and addition. It is shown that every grammar with multiple nonterminals can be encoded into a grammar with a single nonterminal, with a slight modification of the language. Based on this construction, the compressed membership problem for one-nonterminal conjunctive grammars over {a} is proved to be EXPTIME-complete; the same problem for the context-free grammars is decidable in NLOGSPACE, but becomes NP-complete if the grammar is compressed as well. The equivalence problem for these grammars is shown to be co-r.e.-complete, both finiteness and co-finiteness are r.e.-complete, while equivalence to a fixed unary language with a regular positional notation is decidable.

Journal ArticleDOI
TL;DR: Ellul, Krawetz, Shallit and Wang prove an exponential lower bound on the size of any context-free grammar generating the language of all permutations over some alphabet, and obtain exponential lower bounds for many other languages.

Proceedings ArticleDOI
01 Jan 2011
TL;DR: This work presents an algorithm that uses indexed linear tree grammars (ILTGs) both to describe the input set and compute the set that approximates the collecting semantics, thus enabling a more precise binding analysis than afforded by regular Grammars.
Abstract: The collecting semantics of a program defines the strongest static property of interest. We study the analysis of the collecting semantics of higher-order functional programs, cast as left-linear term rewriting systems. The analysis generalises functional flow analysis and the reachability problem for term rewriting systems, which are both undecidable. We present an algorithm that uses indexed linear tree grammars (ILTGs) both to describe the input set and compute the set that approximates the collecting semantics. ILTGs are equi-expressive with pushdown tree automata, and so, strictly more expressive than regular tree grammars. Our result can be seen as a refinement of Jones and Andersen's procedure, which uses regular tree grammars. The main technical innovation of our algorithm is the use of indices to capture (sets of) substitutions, thus enabling a more precise binding analysis than afforded by regular grammars. We give a simple proof of termination and soundness, and demonstrate that our method is more accurate than other approaches to functional flow and reachability analyses in the literature.

Book ChapterDOI
03 Jul 2011
TL;DR: This work integrates rich static types (including parametric polymorphism, typed distinctions between decorated and undecorated trees, limited type inference, and generalized algebraic data-types) and pattern-matching and maintains familiar and convenient attribute grammar notations and especially their highly extensible nature.
Abstract: While attribute grammars have several features making them advantageous for specifying language processing tools, functional programming languages offer a myriad of features also well-suited for such tasks. Much other work shows the close relationship between these two approaches, often in the form of embedding attribute grammars into lazy functional languages. This paper continues in this tradition, but in the other direction, by integrating various functional language features into attribute grammars. Specifically we integrate rich static types (including parametric polymorphism, typed distinctions between decorated and undecorated trees, limited type inference, and generalized algebraic data-types) and pattern-matching, all in a manner that maintains familiar and convenient attribute grammar notations and especially their highly extensible nature.

Journal ArticleDOI
TL;DR: A new perspective on the smallest grammar problem is proposed by splitting it into two tasks: choosing which words will be the constituents of the grammar and searching for the largest grammar given this set of constituents.
Abstract: The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size.

Proceedings ArticleDOI
22 Sep 2011
TL;DR: This work presents a novel method of embedding context-free grammars in Haskell, and to automatically generate parsers and pretty-printers from them, and supports adding anti-quotation to the generated quasi-quoters, which allows users of the defined language to mix concrete and abstract syntax almost seamlessly.
Abstract: We present a novel method of embedding context-free grammars in Haskell, and to automatically generate parsers and pretty-printers from them. We have implemented this method in a library called BNFC-meta (from the BNF Converter, which it is built on). The library builds compiler front ends using metaprogramming instead of conventional code generation. Parsers are built from labelled BNF grammars that are defined directly in Haskell modules. Our solution combines features of parser generators (static grammar checks, a highly specialised grammar DSL) and adds several features that are otherwise exclusive to combinatory libraries such as the ability to reuse, parameterise and generate grammars inside Haskell.To allow writing grammars in concrete syntax, BNFC-meta provides a quasi-quoter that can parse grammars (embedded in Haskell files) at compile time and use metaprogramming to replace them with their abstract syntax. We also generate quasi-quoters so that the languages we define with BNFC-meta can be embedded in the same way. With a minimal change to the grammar, we support adding anti-quotation to the generated quasi-quoters, which allows users of the defined language to mix concrete and abstract syntax almost seamlessly. Unlike previous methods of achieving anti-quotation, the method used by BNFC-meta is simple, efficient and avoids polluting the abstract syntax types.

Journal ArticleDOI
TL;DR: It is demonstrated that without erasing rules, one-sided random context grammars characterize the family of context-sensitive languages, and with erasingrules, these grammARS characterize theFamily of recursively enumerable languages.
Abstract: The notion of a one-sided random context grammar is defined as a context-free-based regulated grammar, in which a set of permitting symbols and a set of forbidding symbols are attached to every rule, and its set of rules is divided into the set of left random context rules and the set of right random context rules. A left random context rule can rewrite a nonterminal if each of its permitting symbols occurs to the left of the rewritten symbol in the current sentential form while each of its forbidding symbols does not occur there. A right random context rule is applied analogically except that the symbols are examined to the right of the rewritten symbol. The paper demonstrates that without erasing rules, one-sided random context grammars characterize the family of context-sensitive languages, and with erasing rules, these grammars characterize the family of recursively enumerable languages. In fact, these characterization results hold even if the set of left random context rules coincides with the set of right random context rules. Several special cases of these grammars are considered, and their generative power is established. In its conclusion, some important open problems are suggested to study in the future.

Book ChapterDOI
26 Mar 2011
TL;DR: This paper proposes a point-free language of dependent grammars, which it is believed closely corresponds to existing context-free parsing algorithms, and gives a novel transformation from conventional dependent Grammars to point- free ones.
Abstract: Dependent grammars extend context-free grammars by allowing semantic values to be bound to variables and used to constrain parsing. Dependent grammars can cleanly specify common features that cannot be handled by context-free grammars, such as length fields in data formats and significant indentation in programming languages. Few parser generators support dependent parsing, however. To address this shortcoming, we have developed a new method for implementing dependent parsers by extending existing parsing algorithms. Our method proposes a point-free language of dependent grammars, which we believe closely corresponds to existing context-free parsing algorithms, and gives a novel transformation from conventional dependent grammars to point-free ones. To validate our technique, we have specified the semantics of both source and target dependent grammar languages, and proven our transformation sound and complete with respect to those semantics. Furthermore, we have empirically validated the suitability of our point-free language by adapting four parsing engines to support it: an Earley parsing engine; a GLR parsing engine; memoizing, arrow-style parser combinators; and PEG parser combinators.

Proceedings ArticleDOI
03 Apr 2011
TL;DR: How alternative representations from graph theory including graphs, overcomplete graphs and hyperedge graphs can support some of the intuitions handled in shape grammars by direct visual computations with shapes is shown.
Abstract: An implementation of a shape grammar interpreter is described. The underlying graph-theoretic framework is briefly discussed to show how alternative representations from graph theory including graphs, overcomplete graphs and hyperedge graphs can support some of the intuitions handled in shape grammars by direct visual computations with shapes. The resulting plugin implemented in Rhino, code-named GRAPE, is briefly described in the end.

Book ChapterDOI
11 Jul 2011
TL;DR: The aim of this paper is the proposal of a correction framework involving structural repairs of elements with respect to single type tree grammars, involving an efficient algorithm and a prototype implementation.
Abstract: XML documents and related technologies represent a widely accepted standard for managing semi-structured data. However, a surprisingly high number of XML documents is affected by well-formedness errors, structural invalidity or data inconsistencies. The aim of this paper is the proposal of a correction framework involving structural repairs of elements with respect to single type tree grammars. Via the inspection of the state space of a finite automaton recognising regular expressions, we are always able to find all minimal repairs against a defined cost function. These repairs are compactly represented by shortest paths in recursively nested multigraphs, which can be translated to particular sequences of edit operations altering XML trees. We have proposed an efficient algorithm and provided a prototype implementation.

Journal ArticleDOI
TL;DR: This paper studies the complexity of the classical problem of deciding whether a string belongs to the language generated by any attribute grammar from a given class C, and shows that even in the most general case the problem is in polynomial space.

Journal ArticleDOI
TL;DR: It is found that there are NP-hard grammars among non-local MCTAGs even if any or all of the following restrictions are imposed: lexicalization, dominance links, and dominance links.
Abstract: An NP-hardness proof for non-local Multicomponent Tree Adjoining Grammar (MCTAG) by Rambow and Satta (1st International Workshop on Tree Adjoining Grammers 1992), based on Dahlhaus and Warmuth (in J Comput Syst Sci 33:456---472, 1986), is extended to some linguistically relevant restrictions of that formalism. It is found that there are NP-hard grammars among non-local MCTAGs even if any or all of the following restrictions are imposed: (i) lexicalization: every tree in the grammar contains a terminal; (ii) dominance links: every tree set contains at most two trees, and in every such tree set, there is a link between the foot node of one tree and the root node of the other tree, indicating that the former node must dominate the latter in the derived tree. This is the version of MCTAG proposed in Becker et al. (Proceedings of the 5th conference of the European chapter of the Association for Computational Linguistics 1991) to account for German long-distance scrambling. This result restricts the field of possible candidates for an extension of Tree Adjoining Grammar that would be both mildly context-sensitive and linguistically adequate.


Proceedings Article
05 Oct 2011
TL;DR: This work introduces a formulation of synchronous tree-adjoining grammars which is effectively closed under input and output restrictions to regular tree languages, i.e., the restricted translations can again be represented by Grammars.
Abstract: Restricting the input or the output of a grammar-induced translation to a given set of trees plays an important role in statistical machine translation. The problem for practical systems is to find a compact (and in particular, finite) representation of said restriction. For the class of synchronous tree-adjoining grammars, partial solutions to this problem have been described, some being restricted to the unweighted case, some to the monolingual case. We introduce a formulation of this class of grammars which is effectively closed under input and output restrictions to regular tree languages, i.e., the restricted translations can again be represented by grammars. Moreover, we present an algorithm that constructs these grammars for input and output restriction, which is inspired by Earley's algorithm.

Book ChapterDOI
23 May 2011
TL;DR: This work refines the relationship among the classes of languages generated by the above grammars and Local languages and states some considerations about closure properties of (regular) pure 2D context-free languages.
Abstract: Many formal models have been proposed to recognize or to generate two-dimensional words. In this paper, we focus our analysis on (regular) pure 2D context-free grammars, regional tile grammars and Průsa grammars, showing that nevertheless they have been proposed as a generalization of string context free grammars their expressiveness is different. This work refines the relationship among the classes of languages generated by the above grammars and Local languages and states some considerations about closure properties of (regular) pure 2D context-free languages.

Journal ArticleDOI
TL;DR: It is proved that the number of nonterminals in tree controlled grammars without erasing rules leads to an infinite hierarchy of families of tree controlled languages, while every recursively enumerable language can be generated by a tree controlled grammar with erasingrules and at most nine nonterminal.

Proceedings Article
19 Jun 2011
TL;DR: This work introduces a new approach to checking treebank consistency based on a variant of Tree Adjoining Grammar that overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison.
Abstract: This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection.

Proceedings Article
15 Nov 2011
TL;DR: A straightforward and structure-preserving coding pattern to encode arbitrary non-circular attribute grammars as syntax-directed translation schemes for bottom-up parser generation tools that makes it possible the direct implementation of attribute grammar-based specifications using widely-used translation scheme-driven tools for the development of bottom- up language translators.
Abstract: This article describes a straightforward and structure-preserving coding pattern to encode arbitrary non-circular attribute grammars as syntax-directed translation schemes for bottom-up parser generation tools. According to this pattern, a bottom-up oriented translation scheme is systematically derived from the original attribute grammar. Semantic actions attached to each syntax rule are written in terms of a small repertory of primitive attribution operations. By providing alternative implementations for these attribution operations, it is possible to plug in different semantic evaluation strategies in a seamlessly way (e.g., a demand-driven strategy, or a data-driven one). The pattern makes it possible the direct implementation of attribute grammar-based specifications using widely-used translation scheme-driven tools for the development of bottom-up language translators (e.g. YACC, BISON, CUP, etc.). As a consequence, this initial coding can be subsequently refined to yield final efficient implementations. Since these implementations still preserve the ability of being extended with new features described at the attribute grammar level, the advantages from the point of view of development and maintenance become apparent.

Journal ArticleDOI
TL;DR: This work presents an alternative first-order functional interpretation of attribute grammars where the input tree is replaced with an extended cyclic tree each node of which is aware of its context viewed as an additional child tree.

Book ChapterDOI
Pierre Bourreau1, Sylvain Salvati1
06 Sep 2011
TL;DR: An efficient algorithm based on Datalog programming is presented in [Kan07] for context-free grammar of almost linear λ-terms, which are linear κ-terms augmented with a restricted form of copy.
Abstract: The recent emergence of linguistic formalisms exclusively based on the simply-typed λ-calculus to represent both syntax and semantics led to the presentation of innovative techniques which apply to both the problems of parsing and generating natural languages. A common feature of these techniques consists in using strong relations between typing properties and syntactic structures of families of simply-typed λ-terms. Among significant results, an efficient algorithm based on Datalog programming is presented in [Kan07] for context-free grammar of almost linear λ-terms, which are linear λ-terms augmented with a restricted form of copy. We present an extension of this method to terms for which deletion is allowed.