scispace - formally typeset
Search or ask a question

Showing papers on "Context-sensitive grammar published in 2015"


Book ChapterDOI
26 Mar 2015
TL;DR: This article reviews the main classes of probabilistic grammars and points to some active areas of research.
Abstract: Formal grammars are widely used in speech recognition, language translation, and language understanding systems. Grammars rich enough to accommodate natural language generate multiple interpretations of typical sentences. These ambiguities are a fundamental challenge to practical application. Grammars can be equipped with probability distributions, and the various parameters of these distributions can be estimated from data (e.g., acoustic representations of spoken words or a corpus of hand-parsed sentences). The resulting probabilistic grammars help to interpret spoken or written language unambiguously. This article reviews the main classes of probabilistic grammars and points to some active areas of research.

38 citations


Proceedings ArticleDOI
17 Oct 2015
TL;DR: The first truly sub-cubic algorithm that computes language edit distance almost optimally was given in this article. But this algorithm requires a large number of substrings of a given string to be aligned, and it is not known whether it is possible to estimate the edit distance in the same time with high probability.
Abstract: Given a context free language G over alphabet a#x03A3; and a string s, the language edit distance problem seeks the minimum number of edits (insertions, deletions and substitutions) required to convert s into a valid member of the language L(G). The well-known dynamic programming algorithm solves this problem in cubic time in string length [Aho, Peterson 1972, Myers 1985]. Despite its numerous applications, to date there exists no algorithm that computes the exact or approximate language edit distance problem in true sub cubic time. In this paper we give the first such truly sub-cubic algorithm that computes language edit distance almost optimally. We further solve the local alignment problem, for all substrings of s, we can estimate their language edit distance near-optimally in same time with high probability. Next, we design the very first sub cubic algorithm that given an arbitrary stochastic context free grammar, and a string returns a nearly-optimal maximum likelihood parsing of that string. Stochastic context free grammars significantly generalize hidden Markov models, they lie at the foundation of statistical natural language processing, and have found widespread applications in many other fields. To complement our upper bound result, we show that exact computation of maximum likelihood parsing of stochastic grammars or language edit distance in true sub cubic time will imply a truly sub cubic algorithm for all-pairs shortest paths, a long-standing open question. This will result in a breakthrough for a large range of problems in graphs and matrices due to sub cubic equivalence. By a known lower bound result [Lee 2002], and a recent development [Abboud et al. 2015] even the much simpler problem of parsing a context free grammar requires fast matrix multiplication time. Therefore any nontrivial multiplicative approximation algorithms for either of the two problems in time less than matrix-multiplication are unlikely to exist.

27 citations


Journal ArticleDOI
TL;DR: CDGs represent a class of completely lexicalized dependency grammars that express both projective and non-projective dependencies and generate non-context-free languages.

26 citations


01 Apr 2015
TL;DR: S-graph grammars are introduced, a new grammar formalism for computing graph-based semantic representations that uses graphs as semantic representations in a way that is consistent with more classical views on semantic construction.
Abstract: We introduce s-graph grammars, a new grammar formalism for computing graph-based semantic representations. Semantically annotated corpora which use graphs as semantic representations have recently become available, and there have been a number of data-driven systems for semantic parsing that can be trained on these corpora. However, it is hard to map the linguistic assumptions of these systems onto more classical insights on semantic construction. S-graph grammars use graphs as semantic representations, in a way that is consistent with more classical views on semantic construction. We illustrate this with a number of hand-written toy grammars, sketch the use of s-graph grammars for data-driven semantic parsing, and discuss formal aspects.

25 citations


Posted Content
TL;DR: It is proved that the SyGuS problem is undecidable for the theory of equality with uninterpreted functions (EUF) and for a very simple bit-vector theory with concatenation, both for context-free grammars and for tree Grammars.
Abstract: Syntax-guided synthesis (SyGuS) is a recently proposed framework for program synthesis problems. The SyGuS problem is to find an expression or program generated by a given grammar that meets a correctness specification. Correctness specifications are given as formulas in suitable logical theories, typically amongst those studied in satisfiability modulo theories (SMT). In this work, we analyze the decidability of the SyGuS problem for different classes of grammars and correctness specifications. We prove that the SyGuS problem is undecidable for the theory of equality with uninterpreted functions (EUF).We identify a fragment of EUF, which we call regular-EUF, for which the SyGuS problem is decidable. We prove that this restricted problem is EXPTIME-complete and that the sets of solution expressions are precisely the regular tree languages. For theories that admit a unique, finite domain, we give a general algorithm to solve the SyGuS problem on tree grammars. Finite-domain theories include the bit-vector theory without concatenation. We prove SyGuS undecidable for a very simple bit-vector theory with concatenation, both for context-free grammars and for tree grammars. Finally, we give some additional results for linear arithmetic and bit-vector arithmetic along with a discussion of the implication of these results.

18 citations


Journal ArticleDOI
TL;DR: This work provides a fully worked frameshift-aware, semiglobal DNA-protein alignment algorithm whose grammar is composed of products of small, atomic grammars and an embedding in Haskell as a domain-specific language makes the theory directly accessible to writing and using grammar products without the detour of an external compiler.
Abstract: We develop a theory of algebraic operations over linear and context-free grammars that makes it possible to combine simple “atomic” grammars operating on single sequences into complex, multi-dimensional grammars. We demonstrate the utility of this framework by constructing the search spaces of complex alignment problems on multiple input sequences explicitly as algebraic expressions of very simple one-dimensional grammars. In particular, we provide a fully worked frameshift-aware, semiglobal DNA-protein alignment algorithm whose grammar is composed of products of small, atomic grammars. The compiler accompanying our theory makes it easy to experiment with the combination of multiple grammars and different operations. Composite grammars can be written out in $ {\rm L}^AT_{E}X$ for documentation and as a guide to implementation of dynamic programming algorithms. An embedding in Haskell as a domain-specific language makes the theory directly accessible to writing and using grammar products without the detour of an external compiler. Software and supplemental files available here: http://www.bioinf.uni-leipzig.de/Software/gramprod/

18 citations


Proceedings ArticleDOI
23 Oct 2015
TL;DR: This work considers from a practical perspective the problem of checking equivalence of context-free grammars, and proposes an algorithm for proving equivalence that is complete for LL grammARS, yet can be invoked on any context- free grammar, including ambiguous Grammars.
Abstract: We consider from a practical perspective the problem of checking equivalence of context-free grammars. We present techniques for proving equivalence, as well as techniques for finding counter-examples that establish non-equivalence. Among the key building blocks of our approach is a novel algorithm for efficiently enumerating and sampling words and parse trees from arbitrary context-free grammars; the algorithm supports polynomial time random access to words belonging to the grammar. Furthermore, we propose an algorithm for proving equivalence of context-free grammars that is complete for LL grammars, yet can be invoked on any context-free grammar, including ambiguous grammars. Our techniques successfully find discrepancies between different syntax specifications of several real-world languages, and are capable of detecting fine-grained incremental modifications performed on grammars. Our evaluation shows that our tool improves significantly on the existing available state of the art tools. In addition, we used these algorithms to develop an online tutoring system for grammars that we then used in an undergraduate course on computer language processing. On questions involving grammar constructions, our system was able to automatically evaluate the correctness of 95% of the solutions submitted by students: it disproved 74% of cases and proved 21% of them.

18 citations


Journal ArticleDOI
TL;DR: This paper proposes a graph-grammar-based approach to the semantic analysis of model graphs and uses Rekers and Schürr’s Layered Graph Grammars, which may be regarded as a pure generalization of standard context-sensitive string grammars.
Abstract: In this paper, we present a method to convert a metamodel in the form of a UML class diagram into a context-sensitive graph grammar whose language comprises precisely the set of model graphs (UML object diagrams) that conform to the input metamodel. Compared to other approaches that deal with the same problem, we use a graph grammar formalism that does not employ any advanced graph grammar features, such as application conditions, precedence rules, and production schemes. Specifically, we use Rekers and Schurr's Layered Graph Grammars, which may be regarded as a pure generalization of standard context-sensitive string grammars. We show that elementary grammatical features, i.e., grammar labels and context-sensitive graph rewrite rules, suffice to represent metamodels with arbitrary multiplicities and inheritance. Inspired by attribute string grammars, we also propose a graph-grammar-based approach to the semantic analysis of model graphs.

17 citations


Journal ArticleDOI
TL;DR: Even though these grammars are not context-free, one can show that they inherit several of the nice properties of hyperedge replacement Grammars, and their membership problem is in NP.
Abstract: Contextual hyperedge-replacement grammars (contextual grammars, for short) are an extension of hyperedge replacement grammars. They have recently been proposed as a grammatical method for capturing the structure of object-oriented programs, thus serving as an alternative to the use of meta-models like uml class diagrams in model-driven software design. In this paper, we study the properties of contextual grammars. Even though these grammars are not context-free, one can show that they inherit several of the nice properties of hyperedge replacement grammars. In particular, they possess useful normal forms and their membership problem is in NP.

16 citations


Journal ArticleDOI
TL;DR: It is shown that the mechanism for on-the-fly modification of syntax rules can be useful for defining grammars in a modular way, implementing almost all types of language composition in the context of specification of extensible languages.

14 citations


Journal ArticleDOI
TL;DR: This paper proposes a more general model, in which context specifications may be two-sided, that is, both the left and the right contexts can be specified by the corresponding operators.

Journal ArticleDOI
TL;DR: A new approach to model transformation development is proposed which allows to simplify the developed transformations and improve their quality via the exploitation of the languages' structures and it is shown that such transformations have important properties: they terminate and are sound, complete, and deterministic.

Posted Content
TL;DR: This paper investigates several milestones between those two extremes between language equivalence and grammar identity, and proposes a methodology for inconsistency management in grammar engineering.
Abstract: Relating formal grammars is a hard problem that balances between language equivalence (which is known to be undecidable) and grammar identity (which is trivial). In this paper, we investigate several milestones between those two extremes and propose a methodology for inconsistency management in grammar engineering. While conventional grammar convergence is a practical approach relying on human experts to encode differences as transformation steps, guided grammar convergence is a more narrowly applicable technique that infers such transformation steps automatically by normalising the grammars and establishing a structural equivalence relation between them. This allows us to perform a case study with automatically inferring bidirectional transformations between 11 grammars (in a broad sense) of the same artificial functional language: parser specifications with different combinator libraries, definite clause grammars, concrete syntax definitions, algebraic data types, metamodels, XML schemata, object models.

Proceedings ArticleDOI
13 Jan 2015
TL;DR: This work on formalization of language theory proves formally in the Agda dependently typed programming language that each of these transformations is correct in the sense of making progress toward normality and preserving the language of the given grammar.
Abstract: Every context-free grammar can be transformed into an equivalent one in the Chomsky normal form by a sequence of four transformations. In this work on formalization of language theory, we prove formally in the Agda dependently typed programming language that each of these transformations is correct in the sense of making progress toward normality and preserving the language of the given grammar. Also, we show that the right sequence of these transformations leads to a grammar in the Chomsky normal form (since each next transformation preserves the normality properties established by the previous ones) that accepts the same language as the given grammar. As we work in a constructive setting, soundness and completeness proofs are functions converting between parse trees in the normalized and original grammars.

Journal ArticleDOI
TL;DR: A family of formal grammars with an operator for referring to the left context of a substring being defined, as well as with a conjunction operation, are considered, which are proved to be computationally equivalent to an extension of one-way real-time cellular automata with an extra data channel.
Abstract: The paper considers a family of formal grammars that extends linear context-free grammars with an operator for referring to the left context of a substring being defined, as well as with a conjunction operation (as in linear conjunctive grammars). These grammars are proved to be computationally equivalent to an extension of one-way real-time cellular automata with an extra data channel. The main result is the undecidability of the emptiness problem for grammars restricted to a one-symbol alphabet, which is proved by simulating a Turing machine by a cellular automaton with feedback. The same construction proves the Σ0 2 -completeness of the finiteness problem for these grammars and automata.

Proceedings ArticleDOI
14 Dec 2015
TL;DR: It is established that the Watson-Crick regular grammars are closed under almost all of the main closure operations, while the differences between other Watson-crick grammARS with their corresponding Chomsky Grammars depend on the computational power of the Watson -Crick gramMars which still need to be studied.
Abstract: In this paper, we define Watson-Crick context-free grammars, as an extension of Watson-Crick regular grammars and Watson-Crick linear grammars with context-free grammar rules. We show the relation of Watson-Crick (regular and linear) grammars to the sticker systems, and study some of the important closure properties of the Watson-Crick grammars. We establish that the Watson-Crick regular grammars are closed under almost all of the main closure operations, while the differences between other Watson-Crick grammars with their corresponding Chomsky grammars depend on the computational power of the Watson-Crick grammars which still need to be studied.

Proceedings ArticleDOI
13 Jan 2015
TL;DR: This paper examines the class of Linearly Ordered Attribute Grammars (LOAGs), for which strict, bounded size evaluators can be generated and applies an augmenting dependency selection algorithm, allowing it to determine membership for the class LOAG.
Abstract: Attribute Grammars (AGs) extend Context-Free Grammars with attributes: information gathered on the syntax tree that adds semantics to the syntax AGs are very well suited for describing static analyses, code-generation and other phases incorporated in a compiler AGs are divided into classes based on the nature of the dependencies between the attributes In this paper we examine the class of Linearly Ordered Attribute Grammars (LOAGs), for which strict, bounded size evaluators can be generated Deciding whether an Attribute Grammar is linearly ordered is an NP-hard problem The Ordered Attribute Grammars form a subclass of LOAG for which membership is tested in polynomial time by Kastens' algorithm (1980) On top of this algorithm we apply an augmenting dependency selection algorithm, allowing it to determine membership for the class LOAG Although the worst-case complexity of our algorithm is exponential, the algorithm turns out to be efficient for practical full-sized AGs As a result, we can compile the main AG of the Utrecht Haskell Compiler without the manual addition of augmenting dependencies The reader is provided with insight in the difficulty of deciding whether an AG is linearly ordered, what optimistic choice is made by Kastens' algorithm and how augmenting dependencies can resolve these difficulties

Journal ArticleDOI
TL;DR: A normal form is established for formal grammars equipped with operators for specifying the form of the context of a substring, in which extended left contexts are never used, whereas left contexts may be applied only to individual symbols, so that all rules are of the form A ?

Journal ArticleDOI
TL;DR: Using linear algebra and a branching analog of the classic Euler theorem, it is shown that, under an assumption that the terminal alphabet is fixed, the membership problem for regular grammars is P, and that the equivalence problem for context free grammARS is in $\mathrm{\Pi_2^P}$.
Abstract: We consider commutative regular and context-free grammars, or, in other words, Parikh images of regular and context-free languages. By using linear algebra and a branching analog of the classic Euler theorem, we show that, under an assumption that the terminal alphabet is fixed, the membership problem for regular grammars (given v in binary and a regular commutative grammar G, does G generate v?) is P, and that the equivalence problem for context free grammars (do G_1 and G_2 generate the same language?) is in $\mathrm{\Pi_2^P}$.

Book ChapterDOI
30 Jun 2015
TL;DR: In this paper, the authors describe a tool for intersecting context-free grammars for safety verification of recursive multi-threaded programs, using a refinement-based approach.
Abstract: This paper describes a tool for intersecting context-free grammars. Since this problem is undecidable the tool follows a refinement-based approach and implements a novel refinement which is complete for regularly separable grammars. We show its effectiveness for safety verification of recursive multi-threaded programs.

Journal ArticleDOI
TL;DR: It is described how Boolean grammars can improve programming language expressiveness and be used for agile parsing and its potential for source transformation systems is discussed.

Book ChapterDOI
Ryo Yoshinaka1
02 Mar 2015
TL;DR: It is shown that conjunctive grammars are also learnable by a distributional learning technique, and the learner is stronger than theirs, while theirs does not.
Abstract: Approaches based on the idea generically called distributional learning have been making great success in the algorithmic learning of context-free languages and their extensions. We in this paper show that conjunctive grammars are also learnable by a distributional learning technique. Conjunctive grammars are context-free grammars enhanced with conjunctive rules to extract the intersection of two languages. We also compare our result with the closely related work by Clark et al. (JMLR 2010) on contextual binary feature grammars (cbfgs). Our learner is stronger than theirs. In particular our learner learns every exact cbfg, while theirs does not. Clark et al. emphasized the importance of exact cbfgs but they only conjectured there should be a learning algorithm for exact cbfgs. This paper shows that their conjecture is true.

Proceedings ArticleDOI
27 Jul 2015
TL;DR: It is shown that 2D regular expressions can be used for enumeration of all possible tilings that can be generated and reason about the theoretical capability of these constructs and develop some practical use cases for their application in procedural content generation for games.
Abstract: Procedural content generation for games often uses tile sets. Tilings generated with tile sets are equivalent to pictures generated from a fixed alphabet of characters such as those explored in the area of vision. Formal languages over pictures and their methods of definition such as 2D regular expressions, automata, and array grammars are directly applicable to generation of tilings using finite tile sets. Though grammars such as string grammars, L-systems, and graph grammars have been explored and found useful for the definition of certain content, formal methods have mostly been ignored. We introduce 2D regular expressions and array grammars as generators. We reason about the theoretical capability of these constructs and develop some practical use cases for their application in procedural content generation for games. One area lacking with a search based approach to procedural content generation is an enumeration of all possible tilings that can be generated. We show that 2D regular expressions can be used for enumeration.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: It is shown that the family of arbitrary sticker languages, generated from arbitrary sticker systems, is included in theFamily of Watson-Crick linear languages,generated from Watson-crick linear grammars.
Abstract: Deoxyribonucleic acid, or popularly known as DNA, continues to inspire many theoretical computing models, such as sticker systems and Watson-Crick grammars Sticker systems are the abstraction of ligation processes performed on DNA, while Watson-Crick grammars are models motivated from Watson-Crick finite automata and Chomsky grammars Both of these theoretical models benefit from the Watson-Crick complementarity rule In this paper, we establish the results on the relationship between Watson-Crick linear grammars, which is included in Watson-Crick context-free grammars, and sticker systems We show that the family of arbitrary sticker languages, generated from arbitrary sticker systems, is included in the family of Watson-Crick linear languages, generated from Watson-Crick linear grammars

Proceedings ArticleDOI
10 Jan 2015
TL;DR: This position paper proposes to detect unspecified information with appropriate ontologies and exploits the descriptive power of constraints both for defining sentence acceptability and for inferring lexical knowledge from a word’s sentential context, even when foreign.
Abstract: Womb Grammars are a recently introduced constraint-based methodology for acquiring linguistic information on a given language from that of another, implemented in CHRG (Constraint Handling Rule Grammars). This is a position paper that discusses their possible adaptation to multilingual text parsing. In particular, we propose to detect unspecified information with appropriate ontologies. Our proposed methodology exploits the descriptive power of constraints both for defining sentence acceptability and for inferring lexical knowledge from a word’s sentential context, even when foreign.

Journal ArticleDOI
TL;DR: A method of grammar regularization with the help of an algorithm of eliminating the left/right-hand side recursion of nonterminals which ultimately converts a context-free grammar into a regular one.
Abstract: Regularization of translational context-free grammar via equivalent transformations is a mandatory step in developing a reliable processor of a formal language defined by this grammar. In the 1970-ies, the multi-component oriented graphs with basic equivalent transformations were proposed to represent a formal grammar of ALGOL-68 in a compiler for IBM/360 compatibles. This paper describes a method of grammar regularization with the help of an algorithm of eliminating the left/right-hand side recursion of nonterminals which ultimately converts a context-free grammar into a regular one. The algorithm is based on special equivalent transformations of the grammar syntactic graph: elimination of recursions and insertion of iterations. When implemented in the system SynGT, it has demonstrated over 25% reduction of the memory size required to store the respective intermediate control tables, compared to the algorithm used in Flex/Bison parsers.

Posted Content
TL;DR: This paper presents a formalization, using the Coq proof assistant, of the fact that general context-free grammars generate languages that can be also generated by simpler and equivalent context- Free Grammars.
Abstract: Context-free grammar simplification is a subject of high importance in computer language processing technology as well as in formal language theory. This paper presents a formalization, using the Coq proof assistant, of the fact that general context-free grammars generate languages that can be also generated by simpler and equivalent context-free grammars. Namely, useless symbol elimination, inaccessible symbol elimination, unit rules elimination and empty rules elimination operations were described and proven correct with respect to the preservation of the language generated by the original grammar.

Proceedings ArticleDOI
21 May 2015
TL;DR: This paper investigates an alternative approach to inferring grammars via pattern languages and elementary formal system frameworks and summarizes inferability results for subclasses of both frameworks and discusses how they map to the Chomsky hierarchy.
Abstract: Formal Language Theory for Security (LANGSEC) has proposed that formal language theory and grammars be used to define and secure protocols and parsers. The assumption is that by restricting languages to lower levels of the Chomsky hierarchy, it is easier to control and verify parser code. In this paper, we investigate an alternative approach to inferring grammars via pattern languages and elementary formal system frameworks. We summarize inferability results for subclasses of both frameworks and discuss how they map to the Chomsky hierarchy. Finally, we present initial results of pattern language learning on logged HTTP sessions and suggest future areas of research.

Posted Content
TL;DR: This paper presents a formalization, using the Coq proof assistant, of fundamental results related to context-free grammars and languages, including closure properties, grammar simplification, and the existence of a Chomsky Normal Form.
Abstract: Context-free language theory is a subject of high importance in computer language processing technology as well as in formal language theory. This paper presents a formalization, using the Coq proof assistant, of fundamental results related to context-free grammars and languages. These include closure properties (union, concatenation and Kleene star), grammar simplification (elimination of useless symbols inaccessible symbols, empty rules and unit rules) and the existence of a Chomsky Normal Form for context-free grammars.

Book ChapterDOI
24 Nov 2015
TL;DR: This paper introduces another variant of P2DCFG that corresponds to "rightmost" rewriting in string context-free grammars and examines the effect of regulating the rewriting in a ri¾?/i½?dP2 DCFG by suitably adapting two well-known control mechanisms in string Grammars, namely, control words and matrix control.
Abstract: Pure two-dimensional context-free grammar P2DCFG is a simple but effective non-isometric 2D grammar model to generate picture arrays. This 2D grammar uses only one kind of symbol as in a pure string grammar and rewrites in parallel all the symbols in a column or row by a set of context-free type rules. P2DCFG and a variant called li¾?/i¾?uP2DCFG, which was recently introduced motivated by the "leftmost" rewriting mode in string context-free grammars, have been investigated for different properties. In this paper we introduce another variant of P2DCFG that corresponds to "rightmost" rewriting in string context-free grammars. The resulting grammar is called ri¾?/i¾?dP2DCFG and rewrites in parallel all the symbols only in the rightmost column or the lowermost row of a picture array by a set of context-free type rules. Unlike the case of string context-free grammars, the picture language families of P2DCFG and the two variants li¾?/i¾?uP2DCFG and ri¾?/i¾?dP2DCFG are mutually incomparable, although they are not disjoint. We also examine the effect of regulating the rewriting in a ri¾?/i¾?dP2DCFG by suitably adapting two well-known control mechanisms in string grammars, namely, control words and matrix control.