scispace - formally typeset
Search or ask a question

Showing papers on "Context-sensitive grammar published in 2013"


Journal ArticleDOI
TL;DR: This paper surveys the results on conjunctive and Boolean Grammars obtained over the last decade, comparing them to the corresponding results for ordinary context-free grammars and their main subfamilies.

63 citations


Journal ArticleDOI
TL;DR: This article presents a formalism for non-projective dependency grammar in the framework of linear context-free rewriting systems, and shows that parsing with unrestricted grammars is intractable, and defines a class of “mildly” non- projective dependency Grammars that can be parsed in polynomial time.
Abstract: Syntactic representations based on word-to-word dependencies have a long-standing tradition in descriptive linguistics, and receive considerable interest in many applications. Nevertheless, dependency syntax has remained something of an island from a formal point of view. Moreover, most formalisms available for dependency grammar are restricted to projective analyses, and thus not able to support natural accounts of phenomena such as wh-movement and cross-serial dependencies. In this article we present a formalism for non-projective dependency grammar in the framework of linear context-free rewriting systems. A characteristic property of our formalism is a close correspondence between the non-projectivity of the dependency trees admitted by a grammar on the one hand, and the parsing complexity of the grammar on the other. We show that parsing with unrestricted grammars is intractable. We therefore study two constraints on non-projectivity, block-degree and well-nestedness. Jointly, these two constraints define a class of "mildly" non-projective dependency grammars that can be parsed in polynomial time. An evaluation on five dependency treebanks shows that these grammars have a good coverage of empirical data.

53 citations


Proceedings ArticleDOI
23 Jan 2013
TL;DR: This paper presents a simple extension to context-free grammars that can express these layout rules, and derives GLR and LR(k) algorithms for parsing these Grammars.
Abstract: Several popular languages, such as Haskell, Python, and F#, use the indentation and layout of code as part of their syntax. Because context-free grammars cannot express the rules of indentation, parsers for these languages currently use ad hoc techniques to handle layout. These techniques tend to be low-level and operational in nature and forgo the advantages of more declarative specifications like context-free grammars. For example, they are often coded by hand instead of being generated by a parser generator.This paper presents a simple extension to context-free grammars that can express these layout rules, and derives GLR and LR(k) algorithms for parsing these grammars. These grammars are easy to write and can be parsed efficiently. Examples for several languages are presented, as are benchmarks showing the practical efficiency of these algorithms.

30 citations


Journal Article
TL;DR: This work takes as its starting point a simple learning algorithm for substitutable context-free languages, based on principles of distributional learning, and modify it so that it will converge to a canonical grammar for each language.
Abstract: Standard models of language learning are concerned with weak learning: the learner, receiving as input only information about the strings in the language, must learn to generalise and to generate the correct, potentially infinite, set of strings generated by some target grammar. Here we define the corresponding notion of strong learning: the learner, again only receiving strings as input, must learn a grammar that generates the correct set of structures or parse trees. We formalise this using a modification of Gold's identification in the limit model, requiring convergence to a grammar that is isomorphic to the target grammar. We take as our starting point a simple learning algorithm for substitutable context-free languages, based on principles of distributional learning, and modify it so that it will converge to a canonical grammar for each language. We prove a corresponding strong learning result for a subclass of context-free grammars.

19 citations


Book ChapterDOI
06 Jan 2013
TL;DR: It is shown that every language without the empty word generated by a displacement context-free grammar can be also generated by displacement Lambek grammars.
Abstract: We introduce a new grammar formalism, the displacement context-free grammars, which is equivalent to well-nested multiple context-free grammars. We generalize the notions of Chomsky and Greibach normal forms for these grammars and show that every language without the empty word generated by a displacement context-free grammar can be also generated by displacement Lambek grammars.

17 citations


Journal ArticleDOI
TL;DR: The theoretical basis for a concept of ‘computation-friendly’ shape grammars is explored, through a formal examination of tractability of the grammar formalism, and parametric subshape recognition is shown to be NP.
Abstract: In this paper we explore the theoretical basis for a concept of ‘computation-friendly’ shape grammars, through a formal examination of tractability of the grammar formalism. Although a variety of shape grammar definitions have evolved over time, it is possible to unify these to be backwards compatible. Under this unified definition, a shape grammar can be constructed to simulate any Turing machine from which it follows that: A shape grammar may not halt; its language space can be exponentially large; and in general, its membership problem is unsolvable. Moreover, parametric subshape recognition is shown to be NP. This implies that it is unlikely, in general, to find a polynomial-time algorithm to interpret parametric shape grammars, and that more pragmatic approaches need to be sought. Factors that influence the tractability of shape grammars are identified and discussed.

16 citations


Book ChapterDOI
03 Nov 2013
TL;DR: A theory of algebraic operations over linear grammars that makes it possible to combine simple "atomic" grammARS operating on single sequences into complex, multi-dimensional Grammars is developed and embedding in Haskell as a domain-specific language makes the theory directly accessible to writing and using grammar products without the detour of an external compiler.
Abstract: We develop a theory of algebraic operations over linear grammars that makes it possible to combine simple "atomic" grammars operating on single sequences into complex, multi-dimensional grammars. We demonstrate the utility of this framework by constructing the search spaces of complex alignment problems on multiple input sequences explicitly as algebraic expressions of very simple 1-dimensional grammars. The compiler accompanying our theory makes it easy to experiment with the combination of multiple grammars and different operations. Composite grammars can be written out in ${\hbox{\LaTeX}}$ for documentation and as a guide to implementation of dynamic programming algorithms. An embedding in Haskell as a domain-specific language makes the theory directly accessible to writing and using grammar products without the detour of an external compiler. http://www.bioinf.uni-leipzig.de/Software/gramprod/

11 citations


Book ChapterDOI
08 Aug 2013
TL;DR: This work proposes an extension of the class of languages captured by these formalisms that is arguably mildly context-sensitive, based on a mild use of a copying operation the authors call IO-substitution.
Abstract: The class of mildly context-sensitive languages is commonly regarded as sufficiently rich to capture most aspects of the syntax of natural languages. Many formalisms are known to generate families of languages which belong to this class. Among them are tree-adjoining grammars, multiple context-free grammars and abstract categorial grammars. All these formalisms have in common that they are based on operations which do not copy already derived material in the course of a derivation. We propose an extension of the class of languages captured by these formalisms that is arguably mildly context-sensitive. This extension is based on a mild use of a copying operation we call IO-substitution.

10 citations


Journal ArticleDOI
TL;DR: It is proved that left random context ET0L grammars characterize the family of recursively enumerable languages, and without erasing rules, they characterize thefamily of context-sensitive languages.
Abstract: Consider ET0L grammars. Modify them such that a set of permitting symbols and a set of forbidding symbols are attached to each of their rules, just like in random context grammars. A rule like this can rewrite a symbol if each of its permitting symbols occurs to the left of the symbol to be rewritten in the current sentential form while each of its forbidding symbols does not occur there. ET0L grammars modified in this way are referred to as left random context ET0L grammars, and they represent the principal subject of the investigation in this paper. We prove that these grammars characterize the family of recursively enumerable languages, and without erasing rules, they characterize the family of context-sensitive languages. We also introduce a variety of special cases of these grammars and establish their generative power. In the conclusion, we put all the achieved results into the context of formal language theory as a whole and formulate several open questions.

9 citations


Proceedings Article
01 Nov 2013
TL;DR: A new method for machine learning-based optimization of linguist-written Constraint Grammars is presented and the effect of rule ordering/sorting, grammarsectioning and systematic rule changes is discussed and quantitatively evaluated.
Abstract: In this paper we present a new method for machine learning-based optimization of linguist-written Constraint Grammars. The effect of rule ordering/sorting, grammarsectioning and systematic rule changes is discussed and quantitatively evaluated. The F-score improvement was 0.41 percentage points for a mature (Danish) tagging grammar, and 1.36 percentage points for a half-size grammar, translating into a 7-15% error reduction relative to the performance of the untuned grammars.

8 citations


Journal ArticleDOI
TL;DR: This paper presents a meta-modelling framework for estimating the response of the immune system to natural catastrophes and shows clear trends in response to natural disasters.
Abstract: [Brijder, Robert] Hasselt Univ, Diepenbeek, Belgium. [Brijder, Robert; Blockeel, Hendrik] Leiden Univ, Leiden Inst Adv Comp Sci, NL-2300 RA Leiden, Netherlands. [Blockeel, Hendrik] Katholieke Univ Leuven, Dept Comp Sci, Louvain, Belgium.

Journal ArticleDOI
TL;DR: Some closure properties of (regularly controlled) pure two-dimensional context-free grammars are studied, for which it is proved that the parsing is NP-hard.
Abstract: Picture languages generalize classical string languages to two-dimensional arrays. Several approaches have been proposed during the years; consequently, a general classification and a detailed comparison of the proposed classes turn to be necessary. In this paper, we study some closure properties of regularly controlled pure two-dimensional context-free grammars, for which we also prove that the parsing is NP-hard. Moreover, we draw some comparisons with other interesting picture grammars like regional tile grammars, Prusa grammars and local languages, clarifying, in some cases, their mutual relationship with respect to expressiveness.

Book ChapterDOI
08 Aug 2013
TL;DR: It is shown that the class of string-meaning relations definable by the following two types of grammars coincides: (i) Lambek Grammars where each lexical item is assigned a (suitably typed) lambda term as a representation of its meaning, and the meaning of a sentence is computed according to the lambda-term corresponding to its derivation.
Abstract: We show that the class of string-meaning relations definable by the following two types of grammars coincides: (i) Lambek grammars where each lexical item is assigned a (suitably typed) lambda term as a representation of its meaning, and the meaning of a sentence is computed according to the lambda-term corresponding to its derivation; and (ii) cycle-free context-free grammars that do not generate the empty string where each rule is associated with a (suitably typed) lambda term that specifies how the meaning of a phrase is determined by the meanings of its immediate constituents.


Journal ArticleDOI
TL;DR: It is proved that unrestricted Szilard languages and certain leftmost SzilARD languages of context-free matrix grammars, without appearance checking, can be accepted by indexing alternating Turing machines in logarithmic time and space.
Abstract: The regulated rewriting mechanism is one of the most efficient methods to augment the Chomsky hierarchy with a large variety of language classes In this paper we investigate the derivation mechanism in regulated rewriting grammars such as matrix grammars, by studying their Szilard languages We focus on the complexity of Szilard languages associated with unrestricted and leftmost-like derivations in matrix grammars, with or without appearance checking The reason is twofold First, to relate these classes of languages to parallel complexity classes such as NC1 and AC1, and, second, to improve some previous results We prove that unrestricted Szilard languages and certain leftmost Szilard languages of context-free matrix grammars, without appearance checking, can be accepted by indexing alternating Turing machines in logarithmic time and space Consequently, these classes are included in UE*-uniform NC1 Unrestricted Szilard languages of matrix grammars with appearance checking can be accepted by deterministic Turing machines in On log n time and Olog n space Leftmost-like Szilard languages of context-free matrix grammars, with appearance checking, can be recognized by nondeterministic Turing machines by using the same time and space resources Hence, all these classes are included in AC1

Book ChapterDOI
01 Jan 2013
TL;DR: A new approach to simulating language evolution is proposed and a Type-2 Fuzzy Grammar is introduced, which is able to gradually adopt a foreign language by adjusting the grades of membership of their grammar.
Abstract: This paper proposes a new approach to simulating language evolution; it expands on the original work done by Lee and Zadeh on Fuzzy Grammars and introduces a Type-2 Fuzzy Grammar. Ants in an Ant Colony Optimization algorithm are given the ability of embedding a message on the pheromone using a Type-2 Fuzzy Grammar. These ants are able to gradually adopt a foreign language by adjusting the grades of membership of their grammar. Results that show the effect of uncertainty in a language are given.

Journal Article
TL;DR: It is shown that the resolved systems of equations over sets of natural numbers can have non-ultimately periodic sets as the least solutions, and conjunctive grammars over a single-letter alphabet can generate non-regular languages, as opposed to context-free grammar.
Abstract: Systems of equations ψ( ~ X) = φ( ~ X) over sets of natural numbers with union, intersection and addition allowed, are studied in this thesis. Such systems can be equally viewed as systems of language equations over a single-letter alphabet and operations of union, intersection and concatenation. The first to be considered is the subclass of systems of equations over sets of numbers of the resolved form ~ X = φ( ~ X). Their counterparts among the language equations are the resolved systems of language equations over a single-letter alphabet, which can be also seen as a conjunctive grammar over a single-letter alphabet. It is shown that the resolved systems of equations over sets of natural numbers can have non-ultimately periodic sets as the least solutions. Equivalently, conjunctive grammars over a single-letter alphabet can generate non-regular languages, as opposed to context-free grammars. To this end, an explicit construction of a resolved system with a given set of numbers as the least solution is presented, provided that base-k positional notations of numbers from this set are recognised by a certain type of a real-time cellular automaton. In the general case of systems of equations, it is shown that the class of unique (least, greatest) solutions of such systems coincides with the class of recursive (recursively enumerable, co-recursively enumerable, respectively) sets. This result holds even when only union and addition (or only intersection and addition) are allowed in the system. This generalises the known result for systems of language equations over a multiple-letter alphabet. Systems with addition as the only allowed operation are also considered, and it is shown that the obtained class of sets is computationally universal, in the sense that their unique (least, greatest) solutions can represent encodings of all recursive (recursively enumerable, co-recursively enumerable, respectively) sets. The computational complexity of decision problems for both formalisms is investigated. It is shown that the membership problem for the resolved systems of equations is EXPTIME-hard. Many other decision problems for both types of systems are proved to be undecidable, and their exact undecidability level is settled. Most of these results hold even when the systems are restricted to the use of one equation with one variable.

Book ChapterDOI
02 Apr 2013
TL;DR: It is shown that the class of languages of PI-LIGs is incomparable with that of PDAs, which is theclass of context-free languages (CFLs), and a simple bottom-up parsing method for LIGs, in which the stack symbols are eliminated at the first step of the parsing.
Abstract: This paper investigates two subjects in push-down automata (PDAs) and linear indexed grammars (LIGs), which are extended PDAs, focusing on eliminating the stack symbols One of the subjects is concerned with PI- (push-input-) PDA and PI-LIG without e-transition rule, in which only input symbols are pushed down to the stack It is shown that the class of languages of PI-LIGs is incomparable with that of PDAs, which is the class of context-free languages (CFLs) The other subject is a simple bottom-up parsing method for LIGs, in which the stack symbols are eliminated at the first step of the parsing The paper shows several PI-LIGs, including PI-PDAs for fundamental context-free and context-sensitive languages, which are synthesized by a grammatical inference system LIG Learner

Proceedings ArticleDOI
15 Oct 2013
TL;DR: A visual representation for the proposed tactics for theorem proving is proposed, in order to turn them even more intuitive and user friendly.
Abstract: Graph grammar is a formal language suitable for the specification of distributed and concurrent systems. Theorem proving is a technique that allows the verification of systems with huge (and infinite) state space. A previous approach has proposed proof strategies to help the developer in the verification process through theorem proving, when adopting graph grammar as specification language. This paper proposes a visual representation for the proposed tactics, in order to turn them even more intuitive and user friendly.

Journal ArticleDOI
TL;DR: For any context-free grammar, the authors build a transition diagram, that is, a finite directed graph with labeled arcs, which describes the work of the grammar, and define the concept of proper walk in this transition diagram and prove that a word belongs to a given context free language if and only if this word can be obtained with the help of a proper walk.
Abstract: For any context-free grammar, we build a transition diagram, that is, a finite directed graph with labeled arcs, which describes the work of the grammar. This approach is new, and it is different from previously known graph models. We define the concept of proper walk in this transition diagram and we prove that a word belongs to a given context-free language if and only if this word can be obtained with the help of a proper walk.

Journal ArticleDOI
TL;DR: A mathematical apparatus for the description of context-dependent grammars of artificial languages, viz., their syntax, as well as their logical and generating semantics, is presented.
Abstract: This paper presents a mathematical apparatus for the description of context-dependent grammars of artificial languages, viz., their syntax, as well as their logical and generating semantics. Examples that demonstrate the applicability of the suggested apparatus are given; the model of formal languages (Chomsky generating grammars) is compared with the model of artificial languages introduced in this work.

Journal ArticleDOI
TL;DR: A mathematical apparatus for describing the context-free grammars of artificial languages is presented and the syntax of the language to describe them is defined, as well as logical and generative semantics.
Abstract: A mathematical apparatus for describing the context-free grammars of artificial languages is presented: the syntax of the language to describe them is defined, as well as logical and generative semantics, and examples are provided to show how the suggested apparatus can be used.

Journal ArticleDOI
TL;DR: This work investigates sequential derivation languages associated with graph grammars, as a loose generalisation of free-labeled Petri nets and Szilard languages, which is quite large and endowed with many closure properties.
Abstract: We investigate sequential derivation languages associated with graph grammars, as a loose generalisation of free-labeled Petri nets and Szilard languages. The grammars are used to output strings of rule labels, and the applicability of a special rule determines the acceptance of a preceding derivation. Due to the great power of such grammars, this family of languages is quite large and endowed with many closure properties. All derivation languages are decidable in nondeterministic polynomial time and space O(n log n), by simulation of the graph grammar on a Turing machine.

Posted Content
TL;DR: A new statistical model for computational linguistics that defines a Markov chain on finite sets of sentences with many finite recurrent communicating classes and defines the language model as the invariant probability measures of the chain on each recurrent communicating class.
Abstract: We propose a new statistical model for computational linguistics. Rather than trying to estimate directly the probability distribution of a random sentence of the language, we define a Markov chain on finite sets of sentences with many finite recurrent communicating classes and define our language model as the invariant probability measures of the chain on each recurrent communicating class. This Markov chain, that we call a communication model, recombines at each step randomly the set of sentences forming its current state, using some grammar rules. When the grammar rules are fixed and known in advance instead of being estimated on the fly, we can prove supplementary mathematical properties. In particular, we can prove in this case that all states are recurrent states, so that the chain defines a partition of its state space into finite recurrent communicating classes. We show that our approach is a decisive departure from Markov models at the sentence level and discuss its relationships with Context Free Grammars. Although the toric grammars we use are closely related to Context Free Grammars, the way we generate the language from the grammar is qualitatively different. Our communication model has two purposes. On the one hand, it is used to define indirectly the probability distribution of a random sentence of the language. On the other hand it can serve as a (crude) model of language transmission from one speaker to another speaker through the communication of a (large) set of sentences.

Posted Content
TL;DR: This paper showed that Lambek grammars, possibly with product, are learnable from proof frames that are incomplete proof nets, and they proved its convergence for 1-valued Lambmars with product.
Abstract: In addition to their limpid interface with semantics, categorial grammars enjoy another important property: learnability. This was first noticed by Buskowsky and Penn and further studied by Kanazawa, for Bar-Hillel categorial grammars. What about Lambek categorial grammars? In a previous paper we showed that product free Lambek grammars where learnable from structured sentences, the structures being incomplete natural deductions. These grammars were shown to be unlearnable from strings by Foret and Le Nir. In the present paper we show that Lambek grammars, possibly with product, are learnable from proof frames that are incomplete proof nets. After a short reminder on grammatical inference a la Gold, we provide an algorithm that learns Lambek grammars with product from proof frames and we prove its convergence. We do so for 1-valued also known as rigid Lambek grammars with product, since standard techniques can extend our result to $k$-valued grammars. Because of the correspondence between cut-free proof nets and normal natural deductions, our initial result on product free Lambek grammars can be recovered. We are sad to dedicate the present paper to Philippe Darondeau, with whom we started to study such questions in Rennes at the beginning of the millennium, and who passed away prematurely. We are glad to dedicate the present paper to Jim Lambek for his 90 birthday: he is the living proof that research is an eternal learning process.

Posted Content
TL;DR: Grammars with prohibition provide more powerful tools for natural language generation and better describe processes of language learning than the conventional formal grammars, and it is demonstrated that they have essentially higher computational power and expressive possibilities in comparison with the conventionalormal grammar.
Abstract: A practical tool for natural language modeling and development of human-machine interaction is developed in the context of formal grammars and languages. A new type of formal grammars, called grammars with prohibition, is introduced. Grammars with prohibition provide more powerful tools for natural language generation and better describe processes of language learning than the conventional formal grammars. Here we study relations between languages generated by different grammars with prohibition based on conventional types of formal grammars such as context-free or context sensitive grammars. Besides, we compare languages generated by different grammars with prohibition and languages generated by conventional formal grammars. In particular, it is demonstrated that they have essentially higher computational power and expressive possibilities in comparison with the conventional formal grammars. Thus, while conventional formal grammars are recursive and subrecursive algorithms, many classes of grammars with prohibition are superrecursive algorithms. Results presented in this work are aimed at the development of human-machine interaction, modeling natural languages, empowerment of programming languages, computer simulation, better software systems, and theory of recursion.

Posted Content
TL;DR: Formal grammars are extensively used in Computer Science and related fields to study the rules which govern production of a language, and one possibility is to view them as logical machines, similar to automata, which can be modified to compute or help in computation, while also performing the basic task of language production.
Abstract: Formal grammars are extensively used in Computer Science and related fields to study the rules which govern production of a language The use of these grammars can be extended beyond mere language production One possibility is to view these grammars as logical machines, similar to automata, which can be modified to compute or help in computation, while also performing the basic task of language production The difference between such a modified grammar and an automaton will then lie in the semantics of computation performed It is even possible for such a grammar to appear non-functional (when no language is produced as a result of its productions), but in reality, it might be carrying out important tasks Such grammars have been named Functional Grammars (including a special sub-category, called Virtual Grammars), and their properties are studied in the paper

Journal ArticleDOI
TL;DR: The notion of a pseudo inherently ambiguous language with respect to two complexity measures is introduced and investigated and an open problem from [15] is solved in this framework.
Abstract: Contextual grammars are introduced by Solomon Marcus in 1969 based on the fundamental concept of descriptive linguistics of insertion of strings in given contexts. Internal contextual grammars are introduced by Paun and Nguyen in 1980. For contextual grammars several descriptional complexity measures and levels of ambiguity have been defined. In this paper, we analyze the trade-off between ambiguity and complexity of languages generated by internal contextual grammars. The notion of a pseudo inherently ambiguous language with respect to two complexity measures is introduced and investigated. These languages can be generated by unambiguous grammars which are minimal with respect to one measure and ambiguous if they are minimal with respect to the other measure. An open problem from [15] is solved in this framework.

17 Jun 2013
TL;DR: An algorithm is presented that aims at foreseeing all elementary trees attached at words which can come between two given words of a sentence, whose associated elementary trees are companions, that is, they will necessarily interact in the syntactic composition of the sentence.
Abstract: Static Analysis of Interactions between Elementary Structures of a Grammar We are interested in the semi-automatic construction of computational grammars and in their use for parsing. We consider lexicalized grammars with elementary structures which are trees, underspecified or not. We present an algorithm that aims at foreseeing all elementary trees attached at words which can come between two given words of a sentence, whose associated elementary trees are companions, that is, they will necessarily interact in the syntactic composition of the sentence.