Showing papers on "Context-sensitive grammar published in 2014"
••
01 Apr 2014
TL;DR: This work presents a synchronous grammar formalism in which it is easy to write rules by hand and also acquire them automatically from dependency parses of aligned English and Simple English sentences, which is optimised for monolingual translation.
Abstract: We present an approach to text simplification based on synchronous dependency grammars. The higher level of abstraction afforded by dependency representations allows for a linguistically sound treatment of complex constructs requiring reordering and morphological change, such as conversion of passive voice to active. We present a synchronous grammar formalism in which it is easy to write rules by hand and also acquire them automatically from dependency parses of aligned English and Simple English sentences. The grammar formalism is optimised for monolingual translation in that it reuses ordering information from the source sentence where appropriate. We demonstrate the superiority of our approach over a leading contemporary system based on quasi-synchronous tree substitution grammars, both in terms of expressivity and performance.
87 citations
••
TL;DR: A new formalism for CFGs is presented that borrows from PEGs the use of parsing expressions and the recognition-based semantics, and it is shown how one way of removing non-determinism from this formalism yields a formalism with the semantics of P EGs.
32 citations
••
TL;DR: A family of distributional learning algorithms for context-free grammars are extended to the class of Parallel Multiple Context-Free Grammars (pmcfgs), which are capable of representing all of the syntactic phenomena that have been claimed to exist in natural language.
Abstract: Natural languages require grammars beyond context-free for their description. Here we extend a family of distributional learning algorithms for context-free grammars to the class of Parallel Multiple Context-Free Grammars (pmcfgs). These grammars have two additional operations beyond the simple context-free operation of concatenation: the ability to interleave strings of symbols, and the ability to copy or duplicate strings. This allows the grammars to generate some non-semilinear languages, which are outside the class of mildly context-sensitive grammars. These grammars, if augmented with a suitable feature mechanism, are capable of representing all of the syntactic phenomena that have been claimed to exist in natural language.
We present a learning algorithm for a large subclass of these grammars, that includes all regular languages but not all context-free languages. This algorithm relies on a generalisation of the notion of distribution as a function from tuples of strings to entire sentences; we define nonterminals using finite sets of these functions. Our learning algorithm uses a nonprobabilistic learning paradigm which allows for membership queries as well as positive samples; it runs in polynomial time.
28 citations
••
TL;DR: Two equivalent definitions of grammars with left contexts are given and their basic properties are established, including a transformation to a normal form and a cubic-time parsing algorithm, with a square-time version for unambiguous Grammars.
Abstract: The paper introduces an extension of context-free grammars equipped with an operator for referring to the left context of the substring being defined. For example, a rule A->[email protected]?B defines a symbol a, as long as it is preceded by a string defined by B. The conjunction operator in this example is taken from conjunctive grammars (Okhotin, 2001), which are an extension of ordinary context-free grammars that maintains most of their practical properties, including many parsing algorithms. This paper gives two equivalent definitions of grammars with left contexts-by logical deduction and by language equations-and establishes their basic properties, including a transformation to a normal form and a cubic-time parsing algorithm, with a square-time version for unambiguous grammars.
26 citations
•
TL;DR: The context-free S languages can be obtained from the deterministic one-way S automaton languages by way of the delta operations on languages, introduced in this paper.
Abstract: Context-free S grammars are introduced, for arbitrary (storage) type S, as a uniform framework for recursion-based grammars, automata, and transducers, viewed as pro- grams. To each occurrence of a nonterminal of a context-free S grammar an object of type S is associated, that can be acted upon by tests and operations, as indicated in the rules of the grammar. Taking particular storage types gives particular formalisms, such as indexed grammars, top-down tree transducers, attribute grammars, etc. Context-free S grammars are equivalent to pushdown S automata. The context-free S languages can be obtained from the deterministic one-way S automaton languages by way of the delta operations on languages, introduced in this paper.
24 citations
••
TL;DR: The approach to Object Grammars is implemented as one of the foundations of the Ens?
23 citations
••
TL;DR: This research presents a grammar rule analysis method to provide a more systematic development process for grammar rules and aims to improve the quality of the rules and in turn have a major impact on thequality of the designs generated.
Abstract: The use of generative design grammars for computational design synthesis has been shown to be successful in many application areas. The development of advanced search and optimization strategies to guide the computational synthesis process is an active research area with great improvements in the last decades. The development of the grammar rules, however, often resembles an art rather than a science. Poor grammars drive the need for problem specific and sophisticated search and optimization algorithms that guide the synthesis process toward valid and optimized designs in a reasonable amount of time. Instead of tuning search algorithms for inferior grammars, this research focuses on designing better grammars to not unnecessarily burden the search process. It presents a grammar rule analysis method to provide a more systematic development process for grammar rules. The goal of the grammar rule analysis method is to improve the quality of the rules and in turn have a major impact on the quality of the designs generated. Four different grammars for automated gearbox synthesis are used as a case study to validate the developed method and show its potential.
20 citations
••
TL;DR: Methodological considerations on crucial issues in areas of string and graph grammar-based syntactic methods are made and recommendations concerning an enhancement of context-free grammars as well as constructing parsable and inducible classes of graph Grammars are formulated.
Abstract: Fundamental open problems, which are frontiers of syntactic pattern recognition are discussed in the paper. Methodological considerations on crucial issues in areas of string and graph grammar-based syntactic methods are made. As a result, recommendations concerning an enhancement of context-free grammars as well as constructing parsable and inducible classes of graph grammars are formulated.
19 citations
•
TL;DR: This embedding result has several important consequences: it not only provides a simple new proof theory for the calculus, thereby clarifying the proof-theoretic foundations of hybrid type-logical grammars, but, since the translation is simple and direct, it also provides several new parsing strategies for hybridType-Logical Grammars.
Abstract: In this article we show that hybrid type-logical grammars are a fragment of first-order linear logic. This embedding result has several important consequences: it not only provides a simple new proof theory for the calculus, thereby clarifying the proof-theoretic foundations of hybrid type-logical grammars, but, since the translation is simple and direct, it also provides several new parsing strategies for hybrid type-logical grammars. Second, NP-completeness of hybrid type-logical grammars follows immediately. The main embedding result also sheds new light on problems with lambda grammars/abstract categorial grammars and shows lambda grammars/abstract categorial grammars suffer from problems of over-generation and from problems at the syntax-semantics interface unlike any other categorial grammar.
15 citations
••
05 Apr 2014TL;DR: It is shown that any tree languages generated by order-2 unsafe grammars are context-sensitive and this also implies that any unsafe order-3 word languages arecontext-sensitive.
Abstract: Higher-order grammars have been extensively studied in 1980’s and interests in them have revived recently in the context of higher-order model checking and program verification, where higherorder grammars are used as models of higher-order functional programs A lot of theoretical questions remain open, however, for unsafe higherorder grammars (grammars without the so-called safety condition) In this paper, we show that any tree languages generated by order-2 unsafe grammars are context-sensitive This also implies that any unsafe order-3 word languages are context-sensitive The proof involves novel technique based on typed lambda-calculus, such as type-based grammar transformation
14 citations
••
TL;DR: A semantics for building grammars from a modularised specification in which modules are able to delete productions from imported nonterminals is established, to allow a precise answer to the question: ‘what character level language does this grammar generate’ in the face of difficult issues.
••
01 Sep 2014TL;DR: The results of the paper lead to a natural generalization of the model intersection theorem for definite logic programs, to the more general class of normal logic programs.
Abstract: We derive two novel theorems regarding pre-fixed points of non-monotonic functions and demonstrate that they have immediate applications in logic programming and formal grammars. In particular, the results of the paper lead to a natural generalization of the model intersection theorem for definite logic programs, to the more general class of normal logic programs. Moreover, the obtained results also offer the first to our knowledge model intersection result for Boolean grammars.
••
01 Apr 2014TL;DR: The notion of a phenominator is introduced as a way to encode the term structure of a functor separately from its “string support”, then employed to analyze a range of coordination phenomena typically left unaddressed by Linear Logic-based Curryesque frameworks.
Abstract: Linear Categorial Grammar (LinCG) is a sign-based, Curryesque, relational, logical categorial grammar (CG) whose central architecture is based on linear logic. Curryesque grammars separate the abstract combinatorics (tectogrammar) of linguistic expressions from their concrete, audible representations (phenogrammar). Most of these grammars encode linear order in string-based lambda terms, in which there is no obvious way to distinguish right from left. Without some notion of directionality, grammars are unable to differentiate, say, subject and object for purposes of building functorial coordinate structures. We introduce the notion of a phenominator as a way to encode the term structure of a functor separately from its “string support”. This technology is then employed to analyze a range of coordination phenomena typically left unaddressed by Linear Logic-based Curryesque frameworks.
•
TL;DR: It is argued that even those newer AnBn grammars cannot test the learning of syntactic hierarchy, and serves to interpret recent animal studies, which make surprising claims about animals’ ability to handle center embedding.
Abstract: Recent artificial-grammar learning (AGL) paradigms driven by the Chomsky hierarchy paved the way for direct comparisons between humans and animals in the learning of center embedding ([A[AB]B]). The AnBn grammars used by the first generation of such research lacked a crucial property of center embedding, where the pairs of elements are explicitly matched ([A1 [A2 B2] B1]). This type of indexing is implemented in the second-generation AnBn grammars. This paper reviews recent studies using such grammars. Against the premises of these studies, we argue that even those newer AnBn grammars cannot test the learning of syntactic hierarchy. These studies nonetheless provide detailed information about the conditions under which human adults can learn an AnBn grammar with indexing. This knowledge serves to interpret recent animal studies, which make surprising claims about animals’ ability to handle center embedding.
•
01 Aug 2014TL;DR: This work formalizes and generalizes some existing mechanisms for dealing with discontinuous phrase structures and non-projective dependency structures and introduces the concept of hybrid grammars, which are extensions of synchronous grammARS, obtained by coupling of lexical elements.
Abstract: We introduce the concept of hybrid grammars, which are extensions of synchronous grammars, obtained by coupling of lexical elements. One part of a hybrid grammar generates linear structures, another generates hierarchical structures, and together they generate discontinuous structures. This formalizes and generalizes some existing mechanisms for dealing with discontinuous phrase structures and non-projective dependency structures. Moreover, it allows us to separate the degree of discontinuity from the time complexity of parsing.
••
TL;DR: This paper focuses on parallel communicating grammar systems (PCGSs) with context-free components, and it is proved that the class of Szilard languages of centralized (returning or non-returning) PCGSs is included in NC1.
••
14 Jul 2014TL;DR: An online judge for context-free grammars is implemented, and methods based on hashing, SAT, and automata that perform well in practice are designed and implemented.
Abstract: We implement an online judge for context-free grammars. Our system contains a list of problems describing formal languages, and asking for grammars generating them. A submitted proposal grammar receives a verdict of acceptance or rejection depending on whether the judge determines that it is equivalent to the reference solution grammar provided by the problem setter. Since equivalence of context-free grammars is an undecidable problem, we consider a maximum length l and only test equivalence of the generated languages up to words of length l. This length restriction is very often sufficient for the well-meant submissions. Since this restricted problem is still NP-complete, we design and implement methods based on hashing, SAT, and automata that perform well in practice.
••
28 May 2014TL;DR: A new variant of P2DCFGs that generates picture arrays in a leftmost way is introduced that examines the power of these generators that regulate rewriting by control languages.
Abstract: Considering a large variety of approaches in generating picture languages, the notion of pure two-dimensional context-free grammar P2DCFG represents a simple yet expressive non-isometric language generator of picture arrays. In the present paper, we introduce a new variant of P2DCFGs that generates picture arrays in a leftmost way. We concentrate our attention on determining their generative power by comparing it with the power of other picture generators. We also examine the power of these generators that regulate rewriting by control languages.
••
16 Aug 2014TL;DR: A generalization of linear indexed grammars that is equivalent to simple context-free tree Grammars in the same way that linear indexedgrammars are equivalent to tree-adjoining grammARS is defined.
Abstract: I define a generalization of linear indexed grammars that is equivalent to simple context-free tree grammars in the same way that linear indexed grammars are equivalent to tree-adjoining grammars.
••
16 Aug 2014TL;DR: It is shown by example that Clark's algorithm may converge to a grammar that does not define the input language, and it is proved that the algorithm is correctness.
Abstract: A.i¾źClark[2] has shown that the class of languages which have a context-free grammar whose nonterminals can be defined by a finite set of contexts can be identified in the limit, given an enumeration of the language and a test for membership. We show by example that Clark's algorithm may converge to a grammar that does not define the input language. We review the theoretical background, provide a non-obvious modification of the algorithm and prove its correctness.
••
07 Jul 2014TL;DR: This paper provides a synthesis and extension of work that unifies two approaches to such language relations: the automatatheoretic approach based on tree transducers that transform trees to their counterparts in the relation, and the grammatical approachbased on synchronous grammars that derive pairs of trees in the relationship.
Abstract: We tend to think of the study of language as proceeding by characterizing the strings and structures of a language, and we think of natural-language processing as using those structures to build systems of utility in manipulating the language But many language-related problems are more fruitfully viewed as requiring the specification of a relation between two languages, rather than the specification of a single language In this paper, we provide a synthesis and extension of work that unifies two approaches to such language relations: the automatatheoretic approach based on tree transducers that transform trees to their counterparts in the relation, and the grammatical approach based on synchronous grammars that derive pairs of trees in the relation In particular, we characterize synchronous tree-substitution grammars and synchronous tree-adjoining grammars in terms of bimorphisms, which have previously been used to characterize tree transducers In the process, we provide new approaches to formalizing the various concepts: a metanotation for describing varieties of tree automata and transducers in equational terms; a rigorous formalization of tree-adjoining and tree-substitution grammars and their synchronous counterparts, using trees over ranked alphabets; and generalizations of tree-adjoining grammar allowing multiple adjunction
••
TL;DR: This work investigates how to transfer the concept of synchronisation to grammars by defining grammar teams that agree on the generation of shared terminal symbols based on a novel notion of competence.
Abstract: In CD grammar systems, the rewriting process is distributed over component grammars that take turns in the derivation of new symbols. Team automata however collaborate by synchronising their actions. Here we investigate how to transfer this concept of synchronisation to grammars by defining grammar teams that agree on the generation of shared terminal symbols based on a novel notion of competence. We first illustrate this idea for the case of regular grammars and next propose an extension to the case of context-free grammars.
••
TL;DR: It is demonstrated that any recursively enumerable language can be generated by one-sided random context grammars with no more than two right random context rules.
••
31 Mar 2014TL;DR: The main result is the undecidability of the emptiness problem for grammars restricted to a one-symbol alphabet, which is proved by simulating a Turing machine by a cellular automaton with feedback.
Abstract: The paper considers a family of formal grammars that extends linear context-free grammars with an operator for referring to the left context of a substring being defined, as well as with a conjunction operation (as in linear conjunctive grammars). These grammars are proved to be computationally equivalent to an extension of one-way real-time cellular automata with an extra data channel. The main result is the undecidability of the emptiness problem for grammars restricted to a one-symbol alphabet, which is proved by simulating a Turing machine by a cellular automaton with feedback. The same construction proves the \(\Sigma^0_2\)-completeness of the finiteness problem for these grammars.
••
TL;DR: This paper extends the result proving that a strict inclusion of $Perm_2 \subsetneq Perm_3$ was obtained and says that permutation grammar is of order n and all such grammars generate a family of permutation languages Permn.
Abstract: Permutation grammars are an extension of context-free grammars with rules having the same symbols on both sides but possibly in a different order An example of a permutation rule of length 3 is ABC → CBA If these non-context-free rules are of length at most n, then we say that permutation grammar is of order n and all such grammars generate a family of permutation languages Permn In 2010 Nagy showed that there exists a language such that it cannot be generated by a grammar of order 2, but rules of length 3 are enough In other words, a strict inclusion $Perm_2 \subsetneq Perm_3$ was obtained We extend this result proving that $Perm_{4n \minus 2} \subsetneq Perm_{4n \minus 1}$ for n ≥ 1
••
30 Jul 2014TL;DR: A new weighted tree transducer formalism is suggested and it is proved that the transformations of the restricted grammars are precisely those of the linear and nondeleting instances of these transducers.
Abstract: In this paper, we consider weighted synchronous context-free tree grammars and identify a certain syntactic restriction of these grammars. We suggest a new weighted tree transducer formalism and prove that the transformations of the restricted grammars are precisely those of the linear and nondeleting instances of these transducers.
•
21 Jul 2014
TL;DR: The experience of using Triple Graph Grammars (TGG) to synchronize models of the rich and complex Architecture Analysis and Design Language (AADL), an aerospace standard of the Society of Automotive Engineers, provides a validation of the TGG approach for synchronizing models of large meta-models, but shows that model synchronization remains a challenging task.
Abstract: We report our experience of using Triple Graph Grammars (TGG) to synchronize models of the rich and complex Architecture Analysis and Design Language (AADL), an aerospace standard of the Society of Automotive Engineers. A synchronization layer has been developed between the OSATE (Open Source AADL Tool Environment) textual editor and the Adele graphical editor in order to improve their integration. Adele has been designed to support editing AADL models in a way that does not necessarily follow the structure of the language, but is adapted to the way designers think. For this reason, it operates on a different meta-model than OSATE. As a result, changes on the graphical model must be propagated automatically to the textual model to ensure consistency of the models. Since Adele does not cover the complete AADL language, this must be done without re-instantiation of the objects to avoid losing the information not represented in the graphical part. The TGG language implemented in the MoTE tool has been used to synchronize the tools. Our results provide a validation of the TGG approach for synchronizing models of large meta-models, but also show that model synchronization remains a challenging task, since several improvements of the TGG language and its tool were required to succeed.
•
01 May 2014
TL;DR: How a Constraint Grammar with linguist-written rules can be optimized and ported to another language using a Machine Learning technique and the effects of rule movements, sorting, grammar-sectioning and systematic rule modifications are discussed and quantitatively evaluated.
Abstract: In this paper, we describe how a Constraint Grammar with linguist-written rules can be optimized and ported to another language using a Machine Learning technique. The effects of rule movements, sorting, grammar-sectioning and systematic rule modifications are discussed and quantitatively evaluated. Statistical information is used to provide a baseline and to enhance the core of manual rules. The best-performing parameter combinations achieved part-of-speech F-scores of over 92 for a grammar ported from English to Danish, a considerable advance over both the statistical baseline (85.7), and the raw ported grammar (86.1). When the same technique was applied to an existing native Danish CG, error reduction was 10% (F=96.94).
•
TL;DR: It is proved that computing distances corresponds to solving undecidable questions: this is the case for the L1, L2 norm, the variation distance and the Kullback-Leibler divergence.
Abstract: Probabilistic context-free grammars (PCFGs) are used to define distributions over strings, and are powerful modelling tools in a number of areas, including natural language processing, software engineering, model checking, bio-informatics, and pattern recognition. A common important question is that of comparing the distributions generated or modelled by these grammars: this is done through checking language equivalence and computing distances. Two PCFGs are language equivalent if every string has identical probability with both grammars. This also means that the distance (whichever norm is used) is null. It is known that the language equivalence problem is interreducible with that of multiple ambiguity for context-free grammars, a long-standing open question. In this work, we prove that computing distances corresponds to solving undecidable questions: this is the case for the L1, L2 norm, the variation distance and the Kullback-Leibler divergence. Two more results are less negative: 1. The most probable string can be computed, and, 2. The Chebyshev distance (where the distance between two distributions is the maximum difference of probabilities over all strings) is interreducible with the language equivalence problem.
••
01 Aug 2014
TL;DR: This work generalizes Solomonoff’s stochastic context-free grammar induction method to context-sensitive grammars, and applies it to transfer learning problem by means of an efficient update algorithm.
Abstract: We generalize Solomonoff’s stochastic context-free grammar induction method to context-sensitive grammars, and apply it to transfer learning problem by means of an efficient update algorithm. The stochastic grammar serves as a guiding program distribution which improves future probabilistic induction approximations by learning about the training sequence of problems. Stochastic grammar is updated via extrapolating from the initial grammar and the solution corpus. We introduce a data structure to represent derivations and introduce efficient algorithms to compute an updated grammar which modify production probabilities and add new productions that represent past solutions.