scispace - formally typeset
Search or ask a question

Showing papers on "Context-sensitive grammar published in 2009"


Journal ArticleDOI
TL;DR: The results imply undecidability of a number of decision problems of unary conjunctive grammars, as well as non-existence of a recursive function bounding the growth rate of the generated languages.
Abstract: It has recently been proved (Jez, DLT 2007) that conjunctive grammars (that is, context-free grammars augmented by conjunction) generate some non-regular languages over a one-letter alphabet. The present paper improves this result by constructing conjunctive grammars for a larger class of unary languages. The results imply undecidability of a number of decision problems of unary conjunctive grammars, as well as non-existence of a recursive function bounding the growth rate of the generated languages. An essential step of the argument is a simulation of a cellular automaton recognizing positional notation of numbers using language equations.

65 citations


Proceedings ArticleDOI
31 May 2009
TL;DR: This work presents a theoretically principled model which learns compact and simple grammars, uncovering latent linguistic structures (e.g., verb subcategorisation), and in doing so far out-performs a standard PCFG.
Abstract: Tree substitution grammars (TSGs) are a compelling alternative to context-free grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and over-fitting. We present a theoretically principled model which solves these problems using a Bayesian non-parametric formulation. Our model learns compact and simple grammars, uncovering latent linguistic structures (e.g., verb subcategorisation), and in doing so far out-performs a standard PCFG.

63 citations


Book ChapterDOI
01 Jul 2009
TL;DR: A pumping lemma of the usual universal form for the subclass consisting of well-nested multiple context-free languages, which is the same class of languages generated by non-duplicating macro grammars and by coupled-context-free Grammars.
Abstract: Seki et al. (1991) proved a rather weak pumping lemma for multiple context-free languages, which says that any infinite m-multiple context-free language contains a string that is pumpable at some 2m substrings. We prove a pumping lemma of the usual universal form for the subclass consisting of well-nested multiple context-free languages. This is the same class of languages generated by non-duplicating macro grammars and by coupled-context-free grammars.

46 citations


Proceedings ArticleDOI
30 Mar 2009
TL;DR: A new parsing algorithm is described that has the advantage to be incremental and to support PMCFG directly rather than the weaker MCFG formalism, which allows it to be used for grammar based word prediction.
Abstract: Parallel Multiple Context-Free Grammar (PMCFG) is an extension of context-free grammar for which the recognition problem is still solvable in polynomial time. We describe a new parsing algorithm that has the advantage to be incremental and to support PMCFG directly rather than the weaker MCFG formalism. The algorithm is also top-down which allows it to be used for grammar based word prediction.

36 citations


Journal ArticleDOI
TL;DR: It is demonstrated that every Boolean grammar can be transformed into an equivalent (under the new semantics) grammar in normal form, and an O(n^3) algorithm for parsing that applies to any such normalized Boolean grammar is proposed.
Abstract: Boolean grammars [A. Okhotin, Boolean grammars, Information and Computation 194 (1) (2004) 19-48] are a promising extension of context-free grammars that supports conjunction and negation in rule bodies. In this paper, we give a novel semantics for Boolean grammars which applies to all such grammars, independently of their syntax. The key idea of our proposal comes from the area of negation in logic programming, and in particular from the so-called well-founded semantics which is widely accepted in this area to be the ''correct'' approach to negation. We show that for every Boolean grammar there exists a distinguished (three-valued) interpretation of the non-terminal symbols, which satisfies all the rules of the grammar and at the same time is the least fixed-point of an operator associated with the grammar. Then, we demonstrate that every Boolean grammar can be transformed into an equivalent (under the new semantics) grammar in normal form. Based on this normal form, we propose an O(n^3) algorithm for parsing that applies to any such normalized Boolean grammar. In summary, the main contribution of this paper is to provide a semantics which applies to all Boolean grammars while at the same time retaining the complexity of parsing associated with this type of grammars.

35 citations


Journal ArticleDOI
TL;DR: A new syntactic model, called pure two-dimensional (2D) context-free grammar (P2DCFG) is introduced based on the notion of pure context- free string grammar, and the rectangular picture generative power of this 2D grammar model is investigated.

34 citations


Journal ArticleDOI
TL;DR: It is shown that the pumping lemma in L-valued regular languages (L-RLs) more recently established by the second author is generalized.

33 citations


Journal ArticleDOI
TL;DR: This work investigates the problem of computing the partition function of a probabilistic context-free grammar, and considers a number of applicable methods to find PCFGs that result from the intersection of another PCFG and a finite automaton.
Abstract: We investigate the problem of computing the partition function of a probabilistic context-free grammar, and consider a number of applicable methods. Particular attention is devoted to PCFGs that result from the intersection of another PCFG and a finite automaton. We report experiments involving the Wall Street Journal corpus.

31 citations


Book ChapterDOI
01 Jan 2009

30 citations


Proceedings ArticleDOI
31 May 2009
TL;DR: This work presents a pair of grammar transformations that admit an efficient cubic-time CKY-style parsing algorithm despite leaving most of the grammar in n-ary form and describes a two-pass coarse-to-fine parsing approach that prunes the search space using predictions from a subset of the original grammar.
Abstract: The tree-transducer grammars that arise in current syntactic machine translation systems are large, flat, and highly lexicalized. We address the problem of parsing efficiently with such grammars in three ways. First, we present a pair of grammar transformations that admit an efficient cubic-time CKY-style parsing algorithm despite leaving most of the grammar in n-ary form. Second, we show how the number of intermediate symbols generated by this transformation can be substantially reduced through binarization choices. Finally, we describe a two-pass coarse-to-fine parsing approach that prunes the search space using predictions from a subset of the original grammar. In all, parsing time reduces by 81%. We also describe a coarse-to-fine pruning scheme for forest-based language model reranking that allows a 100-fold increase in beam size while reducing decoding time. The resulting translations improve by 1.3 BLEU.

27 citations



Book ChapterDOI
07 Sep 2009
TL;DR: The unfolding semantics is generalized to the abstract setting of (single pushout) rewriting over adhesive categories, which applies to a wider class of systems, which is due to the use of a refined notion of grammar morphism.
Abstract: We generalize the unfolding semantics, previously developed for concrete formalisms such as Petri nets and graph grammars, to the abstract setting of (single pushout) rewriting over adhesive categories. The unfolding construction is characterized as a coreflection, i.e. the unfolding functor arises as the right adjoint to the embedding of the category of occurrence grammars into the category of grammars. As the unfolding represents potentially infinite computations, we need to work in adhesive categories with "well-behaved" colimits of ω-chains of monomorphisms. Compared to previous work on the unfolding of Petri nets and graph grammars, our results apply to a wider class of systems, which is due to the use of a refined notion of grammar morphism.

Proceedings Article
01 Jan 2009
TL;DR: The results demonstrate robust implicit learning of recursively embedded structures (context-free grammar) and recursive structures with cross-dependencies ( context-sensitive grammar) in an artificial grammar learning task spanning 9 days.
Abstract: A Matter of Time: Implicit Acquisition of Recursive Sequence Structures Julia Udden a,b,c (Julia.Udden@ki.se) Susana Araujo c,d (smaraujo@ualg.pt) Christian Forkstam a,b,c (Christian.Forkstam@ki.se) Martin Ingvar b (Martin.Ingvar@ki.se) Peter Hagoort a,c (Peter.Hagoort@mpi.nl) Karl Magnus Petersson a,b,c,d (Karl-Magnus.Petersson@mpi.nl) a Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands b Cognitive Neurophysiology Research Group, Stockholm Brain Institute Karolinska Institutet, Stockholm, Sweden c Donders Institute for Brain, Cognition and Behaviour: Centre for Cognitive Neuroimaging Radboud University Nijmegen, the Netherlands d Cognitive Neuroscience Research Group, Universidade do Algarve, Faro, Portugal classification session. In the acquisition phase, participants are typically engaged in a short term memory task using an acquisition sample of sequences generated from a formal grammar. Subsequently, subjects are informed that the symbol sequences were generated according to a complex system of rules and asked to classify novel items as grammatical or not, typically with the instruction to base their classification decisions on their immediate intuitive impression (i.e., guessing based on ''gut feeling''). It is a robust finding on regular grammars that subjects perform well above chance and more so after several days of learning (Folia et al., 2008; Forkstam, Elwer, Ingvar & Petersson, 2008). Taking the perspective that some aspects of the faculty of language are shared with nonhuman animals (faculty of language in a broad sense; FLB) and that other aspects are specific to human language (faculty of language in a narrow sense; FLN), the quest for FLN in AGL has centered around the theoretical construct of the Chomsky hierarchy – a complexity hierarchy for formal grammars, which are divided into regular (finite state; T3), context-free (phrase- structure; T2), context-sensitive (T1), and general phrase- structure grammars (Turing-Tue; T0), and its embodiment in the recursion-only hypothesis for FLN outlined in a seminal paper by Hauser, Chomsky and Fitch (2002). For example, in a sentence such as: Abstract A dominant hypothesis in empirical research on the evolution of language is the following: the fundamental difference between animal and human communication systems is captured by the distinction between regular and more complex non-regular grammars. Studies reporting successful artificial grammar learning of nested recursive structures and imaging studies of the same have methodological shortcomings since they typically allow explicit problem solving strategies and this has been shown to account for the learning effect in subsequent behavioral studies. The present study overcomes these shortcomings by using subtle violations of agreement structure in a preference classification task. In contrast to the studies conducted so far, we use an implicit learning paradigm, allowing the time needed for both abstraction processes and consolidation to take place. Our results demonstrate robust implicit learning of recursively embedded structures (context-free grammar) and recursive structures with cross-dependencies (context-sensitive grammar) in an artificial grammar learning task spanning 9 days. Keywords: Implicit artificial grammar learning; centre embedded; cross-dependency; implicit learning; context- sensitive grammar; context-free grammar; regular grammar; non-regular grammar Introduction During the past decade, investigations of language acquisition as well as language evolution have been revitalized by the artificial grammar learning (AGL) paradigm which allows animals as well as children and adult humans to implicitly acquire new syntactic structures without explicit teaching, i.e., similar to the conditions for natural language development. In this context, implicit learning is a process whereby a complex, rule-governed knowledge base is acquired largely independent of awareness of both the process and product of acquisition (Reber, Walkenfeld & Hernstadt, 1991). In AGL, one separates the acquisition and the testing phase, and the paradigm consists of at least one acquisition and The cat the rats the dog chases fear is sitting in the yard. The recursive embedding of subordinate phrases in super- ordinate phrases introduces morphological noun-verb agreement dependencies or what we here call nested dependencies. In a recent paper (de Vries, Monaghan, Knecht, & Zwitserlood, 2008), participants were trained on such sequences following the pattern A 1 A 2 A 3 B 3 B 2 B 1 and tested on different kinds of violations, all in one session. Critically, there was no indication of learning in the hierarchical vs. scrambled condition, where non- grammatical sequences were only violating the

Book ChapterDOI
06 Nov 2009
TL;DR: A parsing algorithm is presented based on a representation of such grammars as a combination of a regular grammar and a grammar of balanced parentheses, similar to the representation used in the Chomsky-Schutzenberger theorem.
Abstract: We present a parsing algorithm for arbitrary context-free and probabilistic context-free grammars based on a representation of such grammars as a combination of a regular grammar and a grammar of balanced parentheses, similar to the representation used in the Chomsky-Schutzenberger theorem. The basic algorithm has the same worst-case complexity as the popular CKY and Earley parsing algorithms frequently employed in natural language processing tasks.

Book ChapterDOI
12 Aug 2009
TL;DR: The compressed membership problem for one-nonterminal conjunctive grammars over {a } is proved to be EXPTIME-complete, while the equivalence, finiteness and emptiness problems for these Grammars are shown to be undecidable.
Abstract: Conjunctive grammars over an alphabet Σ = {a } are studied, with the focus on the special case with a unique nonterminal symbol. Such a grammar is equivalent to an equation X = φ (X ) over sets of natural numbers, using union, intersection and addition. It is shown that every grammar with multiple nonterminals can be encoded into a grammar with a single nonterminal, with a slight modification of the language. Based on this construction, the compressed membership problem for one-nonterminal conjunctive grammars over {a } is proved to be EXPTIME-complete, while the equivalence, finiteness and emptiness problems for these grammars are shown to be undecidable.

Book ChapterDOI
27 Mar 2009
TL;DR: It is shown that if tree grammars are nondeterministic or non-linear, then reducing their numbers of parameters cannot be done without an exponential blow-up in grammar size.
Abstract: Trees can be conveniently compressed with linear straight-line context-free tree grammars. Such grammars generalize straight-line context-free string grammars which are widely used in the development of algorithms that execute directly on compressed structures (without prior decompression). It is shown that every linear straight-line context-free tree grammar can be transformed in polynomial time into a monadic (and linear) one. A tree grammar is monadic if each nonterminal uses at most one context parameter. Based on this result, a polynomial time algorithm is presented for testing whether a given nondeterministic tree automaton with sibling constraints accepts a tree given by a linear straight-line context-free tree grammar. It is shown that if tree grammars are nondeterministic or non-linear, then reducing their numbers of parameters cannot be done without an exponential blow-up in grammar size.

Book ChapterDOI
31 Mar 2009
TL;DR: Some results on the generative capacities of such grammars that Petri nets are restricted to some known structural subclasses of Petrinets are presented.
Abstract: A Petri net controlled grammar is a context-free grammar with a control by a Petri net whose transitions are labeled with rules of the grammar or the empty string and the associated language consists of all terminal strings which can be derived in the grammar and the sequence of rules in a derivation is in the image of a successful occurrence of transitions of the net. We present some results on the generative capacities of such grammars that Petri nets are restricted to some known structural subclasses of Petri nets.

Journal Article
TL;DR: In this paper, the authors studied cooperating distributed grammar systems working in the terminal derivation mode where the components are variants of permitting grammars, and they proved that the families of random contexts and languages generated by permitting components coincide.
Abstract: This paper studies cooperating distributed grammar systems working in the terminal derivation mode where the components are variants of permitting grammars. It proves that although the family of permitting languages is strictly included in the family of random context languages, the families of random context languages and languages generated by permitting cooperating distributed grammar systems in the above mentioned derivation mode coincide. Moreover, if the components are so-called left-permitting grammars, then cooperating distributed grammar systems in the terminal mode characterize the class of context-sensitive languages, or if erasing rules are allowed, the class of recursively enumerable languages. Descriptional complexity results are also presented. It is shown that the number of permitting components can be bounded, in the case of left-permitting components with erasing rules even together with the number of nonterminals.

01 Jan 2009
TL;DR: This thesis addresses the design of appropriate formalisms and algorithms to be used for natural language processing and focuses on the Tree-Adjoining Grammar formalism as a base and on the mechanism of grammar synchronization for managing relationships between the input and output of a natural languageprocessing system.
Abstract: This thesis addresses the design of appropriate formalisms and algorithms to be used for natural language processing. This entails a delicate balance between the ability of a formalism to capture the linguistic generalizations required by natural language processing applications and the ability of a natural language processing application based on the formalism to process the formalism efficiently enough to be useful. I focus on the Tree-Adjoining Grammar formalism as a base and on the mechanism of grammar synchronization for managing relationships between the input and output of a natural language processing system. Grammar synchronization is a formal concept by which the derivations of two distinct grammars occur in tandem so that a single isomorphic derivation produces distinct derived structures in each of the synchronized grammars. Using synchronization implies a strong assumption—one that I seek to justify in the second part of the thesis—namely that certain critical relationships in natural language applications, such as the relationship between the syntax and semantics of a language or the relationship between the syntax of two natural languages, are close enough to be expressed with grammars that share a derivational structure. The extent of the isomorphism between the derived structures of the related languages is determined only in part by the synchronization. The base formalism chosen can offer greater or lesser opportunity for divergence in the derived structures. My choice of a base formalism is motivated directly by research into applications of synchronous TAG-based grammars to two natural language applications: semantic interpretation and natural language translations. I first examine a range of TAG variants that have not previously been studied in this level of detail to determine their computational properties and to develop algorithms that can be used to process them. Original results on the complexity of these formalisms are presented as well as novel algorithms for factorizing grammars to reduce the time required to process them. In Part II, I develop applications of synchronous Limited Delay Tree-Local Multicomponent TAG to semantic interpretation and probabilistic synchronous Tree Insertion Grammar to statistical natural language translation.

Book ChapterDOI
20 Sep 2009
TL;DR: In this article, the global grammar constraint over restricted classes of context free grammars like deterministic and unambiguous context-free grammar was investigated, and it was shown that detecting disentailment for the GRAMMAR constraint in these cases is as hard as parsing an unrestricted context free grammar.
Abstract: We investigate the global GRAMMAR constraint over restricted classes of context free grammars like deterministic and unambiguous context-free grammars. We show that detecting disentailment for the GRAMMAR constraint in these cases is as hard as parsing an unrestricted context free grammar.We also consider the class of linear grammars and give a propagator that runs in quadratic time. Finally, to demonstrate the use of linear grammars, we show that a weighted linear GRAMMAR constraint can efficiently encode the EDITDISTANCE constraint, and a conjunction of the EDITDISTANCE constraint and the REGULAR constraint.

Book ChapterDOI
01 Jul 2009
TL;DR: This work gives an exposition of strongly regular grammars and a transformation by Mohri and Nederhof on sets of mutually recursive nonterminals and uses it as a subprocedure to obtain tighter regular approximations to a given context-free grammar.
Abstract: We consider algorithms for approximating context---free grammars by regular grammars, making use of Chomsky's characterization of non---self---embedding grammars as generating regular languages and a transformation by Mohri and Nederhof on sets of mutually recursive nonterminals. We give an exposition of strongly regular grammars and this transformation, and use it as a subprocedure to obtain tighter regular approximations to a given context-free grammar. In another direction, the generalization by a 1---lookahead extends Mohri and Nederhof's transformation by incorporating more context into the regular approximation at the expense of a larger grammar.

Journal ArticleDOI
TL;DR: This paper discusses the terminating derivation mode in cooperating distributed grammar systems where components are forbiddinggrammars instead of context-free grammars, and demonstrates that the number of their components can be reduced to two without changing the generative power.
Abstract: This paper discusses the terminating derivation mode in cooperating distributed grammar systems where components are forbidding grammars instead of context-free grammars. Such systems are called forbidding cooperating distributed grammar systems, and it is demonstrated that the number of their components can be reduced to two without changing the generative power and that these systems are computationally complete. Without erasing productions, however, these systems are less powerful than context-sensitive grammars.

Book ChapterDOI
01 Jul 2009
TL;DR: It is shown that applying linear erasing to a Petri net language yields a language generated by a non-erasing matrix grammar, which yields a reformulation of the problem of whether erasing rules in matrix grammars can be eliminated.
Abstract: It is shown that applying linear erasing to a Petri net language yields a language generated by a non-erasing matrix grammar. The proof uses Petri net controlled grammars. These are context-free grammars, where the application of productions has to comply with a firing sequence in a Petri net. Petri net controlled grammars are equivalent to arbitrary matrix grammars (without appearance checking), but a certain restriction on them (linear Petri net controlled grammars) leads to the class of languages generated by non-erasing matrix grammars. It is also shown that in Petri net controlled grammars (with final markings and arbitrary labeling), erasing rules can be eliminated, which yields a reformulation of the problem of whether erasing rules in matrix grammars can be eliminated.


Book ChapterDOI
31 Mar 2009
TL;DR: This paper answers three open questions concerning the generative power of some simple variants of context-free grammars regulated by context conditions and presents some normal form results, an overview of known results, and unsolved problems.
Abstract: This paper answers three open questions concerning the generative power of some simple variants of context-free grammars regulated by context conditions. Specifically, it discusses the generative power of so-called context-free semi-conditional grammars (which are random context grammars where permitting and forbidding sets are replaced with permitting and forbidding strings) where permitting and forbidding strings of each production are of length no more than one, and of simple semi-conditional grammars where, in addition, no production has attached both a permitting and a forbidding string. Finally, this paper also presents some normal form results, an overview of known results, and unsolved problems.

01 Jan 2009
TL;DR: Generalized random context picture grammars are a method of syntactic picture generation that involves the replacement of variables and the building of functions that will eventually be applied to terminals.
Abstract: We present a summary of results on random context picture grammars (rcpgs), which are a method of syntactic picture generation. The productions of such a grammar are context-free, but their appli- cation is regulated|permitted or forbidden|by context randomly dis- tributed in the developing picture. Thus far we have investigated three important subclasses of rcpgs, namely random permitting context pic- ture grammars, random forbidding context picture grammars and table- driven context-free picture grammars. For each subclass we have proven characterization theorems and shown that it is properly contained in the class of rcpgs. We have also developed a characterization theorem for all picture sets generated by rcpgs, and used it to nd a set that cannot be generated by any rcpg.

Journal ArticleDOI
TL;DR: The authors proves that every recursively enumerable language is generated by a scattered context grammar with no more than four nonterminals and three non-context-free productions, and gives an overview of the results and open problems concerning scattered context grammars and languages.

01 Jan 2009
TL;DR: A framework for computationfriendly parametric shape grammar interpreters is proposed, which is further detailed by a sub-framework over parametric two-dimensional rectangular shapes as both the proof of NP-hardness and rectangular sub- framework invoke elements in graph theory.
Abstract: NP-hardness of parametric subshape recognition for an arbitrary number of open terms is proven. Guided by this understanding of the complexity of subshape recognition, a framework for computationfriendly parametric shape grammar interpreters is proposed, which is further detailed by a sub-framework over parametric two-dimensional rectangular shapes. As both the proof of NP-hardness and rectangular sub-framework invoke elements in graph theory, the relationship between shape and graph grammars is also explored.

Book ChapterDOI
25 Jul 2009
TL;DR: A new cubic parsing algorithm for ambiguous pregroup grammars is presented that modifies the recognition algorithm of Savateev for categorial Grammars based on L\.
Abstract: We present a new cubic parsing algorithm for ambiguous pregroup grammars. It modifies the recognition algorithm of Savateev [10] for categorial grammars based on L\. We show the correctness of the algorithm and give some examples. We compare our algorithm with the algorithm of Oehrle [8] for pregroup grammars and the algorithm of Savateev [10].