scispace - formally typeset
Search or ask a question

Showing papers on "L-attributed grammar published in 2017"


Proceedings ArticleDOI
01 Apr 2017
TL;DR: By training grammars without nonterminal labels, it is found that phrasal representations depend minimally on nonterminals, providing support for the endocentricity hypothesis.
Abstract: Recurrent neural network grammars (RNNG) are a recently proposed probablistic generative modeling family for natural language. They show state-of-the-art language modeling and parsing performance. We investigate what information they learn, from a linguistic perspective, through various ablations to the model and the data, and by augmenting the model with an attention mechanism (GA-RNNG) to enable closer inspection. We find that explicit modeling of composition is crucial for achieving the best performance. Through the attention mechanism, we find that headedness plays a central role in phrasal representation (with the model’s latent attention largely agreeing with predictions made by hand-crafted head rules, albeit with some important differences). By training grammars without nonterminal labels, we find that phrasal representations depend minimally on nonterminals, providing support for the endocentricity hypothesis.

166 citations


Journal ArticleDOI
TL;DR: This grammar induction algorithm has two goals: first, to show that construction grammars are learnable without highly specified innate structure; second, to develop a model of which units do or do not constitute constructions in a given dataset.
Abstract: This paper presents an algorithm for learning the construction grammar of a language from a large corpus. This grammar induction algorithm has two goals: first, to show that construction grammars are learnable without highly specified innate structure; second, to develop a model of which units do or do not constitute constructions in a given dataset. The basic task of construction grammar induction is to identify the minimum set of constructions that represents the language in question with maximum descriptive adequacy. These constructions must (1) generalize across an unspecified number of units while (2) containing mixed levels of representation internally (e.g., both item-specific and schematized representations), and (3) allowing for unfilled and partially filled slots. Additionally, these constructions may (4) contain recursive structure within a given slot that needs to be reduced in order to produce a sufficiently schematic representation. In other words, these constructions are multi-length, multi-level, possibly discontinuous co-occurrences which generalize across internal recursive structures. These co-occurrences are modeled using frequency and the ΔP measure of association, expanded in novel ways to cover multi-unit sequences. This work provides important new evidence for the learnability of construction grammars as well as a tool for the automated corpus analysis of constructions.

39 citations


Proceedings ArticleDOI
01 Sep 2017
TL;DR: The article features the process of introducing new restrictions on these grammar classes through the introduction of new rules, and reveals the features of synthesizing sentences of different languages with the use of generative grammars.
Abstract: The article presents the use of generative grammars in linguistic modeling. A description of sentence syntax modeling is used to automate the process of analysis and synthesis of natural language texts. The article reveals the features of synthesizing sentences of different languages with the use of generative grammars. The article examines influence of norms and rules of a language on the process of constructing grammars. The use of generative grammars has great potential in the development and creation of automated systems for text content processing, linguistic support for linguistic computer systems etc. In natural languages there are situations where notions, which are dependent on the context, are described as independent of context, i.e. in terms of context-free grammars. This description is complicated by the formation of new categories and rules. The article features the process of introducing new restrictions on these grammar classes through the introduction of new rules. Uncut grammars were received if the number of characters in the right part of the rules were not less than the number of characters in the left one. Then by replacing the only character a context-sensitive grammar was received. A grammar with only one character in the left part of the rule is called a context-free grammar. No further natural restrictions may be applied to the left part of a rule. Based on the importance of automatic processing of text content in modern information media (e.g., information retrieval systems, machine translation, semantic, statistical, optical and acoustic analysis and speech synthesis, automated editing, extracting knowledge from text content, abstracting and annotating text content, indexing text content, teaching and didactic, management of linguistic corpora, various tools for lexicography, etc.), specialists are actively looking for new models, ways of their description and methods of automatic processing of text content. One of such methods lies in developing general principles of syntactic lexicographical systems formation and developing mentioned systems of processing text content for specific languages based in these principles. Any parsing tools consist of two parts: a knowledge base of concrete natural language and parsing algorithm, i.e. a set of standard operators of text content processing based on this knowledge. The source of grammatical knowledge is data of morphological analysis and various tables filled with concepts and linguistic units. They are the result of an empirical study of the text content in natural language by experts aiming at highlighting the basic laws for parsing.

36 citations


Book ChapterDOI
TL;DR: This chapter gives a high-level description of a family of theorem provers designed for grammar development in a variety of modern type-logical grammars, including a graph-theoretic way to represent (partial) proofs during proof search.
Abstract: Type-logical grammars use a foundation of logic and type theory to model natural language. These grammars have been particularly successful giving an account of several well-known phenomena on the syntax-semantics interface, such as quantifier scope and its interaction with other phenomena. This chapter gives a high-level description of a family of theorem provers designed for grammar development in a variety of modern type-logical grammars. We discuss automated theorem proving for type-logical grammars from the perspective of proof nets, a graph-theoretic way to represent (partial) proofs during proof search.

21 citations


Book ChapterDOI
18 Jul 2017
TL;DR: P predictive shift-reduce (PSR) parsing for a subclass of hyperedge replacement grammars, which generalizes the concepts of SLR(1) string parsing to graphs, is studied.
Abstract: Graph languages defined by hyperedge replacement grammars can be NP-complete. We study predictive shift-reduce (PSR) parsing for a subclass of these grammars, which generalizes the concepts of SLR(1) string parsing to graphs. PSR parsers run in linear space and time. In comparison to the predictive top-down (PTD) parsers recently developed by the authors, PSR parsing is more efficient and more general, while the required grammar analysis is easier than for PTD parsing.

16 citations


Journal ArticleDOI
TL;DR: The Generalized LR parsing algorithm is extended to the case of “grammars with left contexts” and has the same worst-case cubic-time performance as in the cases of context-free grammars.
Abstract: The Generalized LR parsing algorithm for context-free grammars is notable for having a decent worst-case running time (cubic in the length of the input string, if implemented efficiently), as well as much better performance on “good” grammars. This paper extends the Generalized LR algorithm to the case of “grammars with left contexts” (M. Barash, A. Okhotin, “An extension of context-free grammars with one-sided context specifications”, Inform. Comput., 2014), which augment the context-free grammars with special operators for referring to the left context of the current substring, along with a conjunction operator (as in conjunctive grammars) for combining syntactical conditions. All usual components of the LR algorithm, such as the parsing table, shift and reduce actions, etc., are extended to handle the context operators. The resulting algorithm is applicable to any grammar with left contexts and has the same worst-case cubic-time performance as in the case of context-free grammars.

14 citations


Journal ArticleDOI
TL;DR: It is proposed that an evaluation measure that guides a learner's choice of grammar when more than one is compatible with available input reflects biases of the sentence processing mechanism.
Abstract: An evaluation measure (EM) guides a learner’s choice of grammar when more than one is compatible with available input. EM must be universal, so children receiving comparable input acquire comparabl...

12 citations


Book ChapterDOI
18 Jul 2017
TL;DR: The notion of fusion grammars as a novel device for the generation of (hyper)graph languages is introduced and it is shown that fusion Grammars can simulate hyperedge replacement grammARS that generate connected hypergraphs, that the membership problem is decidable, and that fusiongrammars are more powerful than hyperedGE replacement gramMars.
Abstract: In this paper, we introduce the notion of fusion grammars as a novel device for the generation of (hyper)graph languages. Fusion grammars are motivated by the observation that many large and complex structures can be seen as compositions of a large number of small basic pieces. A fusion grammar is a hypergraph grammar that provides the small pieces as connected components of the start hypergraph. To get arbitrary large numbers of them, they can be copied multiple times. To get large connected hypergraphs, they can be fused by the application of fusion rules. As the first main results, we show that fusion grammars can simulate hyperedge replacement grammars that generate connected hypergraphs, that the membership problem is decidable, and that fusion grammars are more powerful than hyperedge replacement grammars.

10 citations


Journal ArticleDOI
TL;DR: The main advantage over existing frameworks is the ability of hybrid grammars to separate discontinuity of the desired structures from time complexity of parsing, which permits exploration of a large variety of parsing algorithms for discontinuous structures, with different properties.
Abstract: We explore the concept of hybrid grammars, which formalize and generalize a range of existing frameworks for dealing with discontinuous syntactic structures. Covered are both discontinuous phrase structures and non-projective dependency structures. Technically, hybrid grammars are related to synchronous grammars, where one grammar component generates linear structures and another generates hierarchical structures. By coupling lexical elements of both components together, discontinuous structures result. Several types of hybrid grammars are characterized. We also discuss grammar induction from treebanks. The main advantage over existing frameworks is the ability of hybrid grammars to separate discontinuity of the desired structures from time complexity of parsing. This permits exploration of a large variety of parsing algorithms for discontinuous structures, with different properties. This is confirmed by the reported experimental results, which show a wide variety of running time, accuracy, and frequency ...

10 citations


Journal ArticleDOI
TL;DR: The computational issues involved in learning hierarchically structured grammars from strings of symbols alone are discussed and methods based on an abstract notion of the derivational context of a syntactic category lead to learning algorithms based on a form of traditional distributional analysis.
Abstract: Learnability has traditionally been considered to be a crucial constraint on theoretical syntax; however, the issues involved have been poorly understood, partly as a result of the lack of simple learning algorithms for various types of formal grammars. Here I discuss the computational issues involved in learning hierarchically structured grammars from strings of symbols alone. The methods involved are based on an abstract notion of the derivational context of a syntactic category, which in the most elementary case of context-free grammars leads to learning algorithms based on a form of traditional distributional analysis. Crucially, these techniques can be extended to work with mildly context-sensitive grammars (and beyond), thus leading to learning methods that can in principle learn classes of grammars that are powerful enough to represent all natural languages. These learning methods require that the syntactic categories of the grammars be visible in a certain technical sense: They must be well charac...

9 citations


Journal ArticleDOI
TL;DR: An algorithm to optimally compress a finite set of terms using a vectorial totally rigid acyclic tree grammar, based on a polynomial-time reduction to the MaxSAT optimization problem.
Abstract: We present an algorithm to optimally compress a finite set of terms using a vectorial totally rigid acyclic tree grammar. This class of grammars has a tight connection to proof theory, and the grammar compression problem considered in this article has applications in automated deduction. The algorithm is based on a polynomial-time reduction to the MaxSAT optimization problem. The crucial step necessary to justify this reduction consists of applying a term rewriting relation to vectorial totally rigid acyclic tree grammars. Our implementation of this algorithm performs well on a large real-world dataset.

Book ChapterDOI
Sara Garcia1
01 Jan 2017
TL;DR: The goal of this paper is to provide a framework that gathers together the main existing types of grammars in the form of a checklist, which is filled according to an existing shape grammar as an example.
Abstract: Since shape grammars were first described about forty five years ago, several types of grammars have emerged. The goal of this paper is to provide a framework that gathers together the main existing types of grammars. The categorization is preceded by a glossary of 19 terms related to shape grammars. Then, 44 types are placed into 13 chronologically-ordered categories. Each type is characterized with its name, a short description, the reference to the original paper, three examples of existing grammars of the type, and simple illustrative grammars. The types are organized using a classification guide in the form of a checklist, which is filled according to an existing shape grammar as an example.

Proceedings ArticleDOI
09 Oct 2017
TL;DR: In this paper, the authors present attribute grammars as a unifying framework for modeling planning domains and problems, exploiting techniques from formal languages in domain model verification, plan and goal recognition, domain model acquisition, as well as in planning.
Abstract: The paper presents attribute grammars as a unifying framework for modeling planning domains and problems. The motivation is to exploit techniques from formal languages in domain model verification, plan and goal recognition, domain model acquisition, as well as in planning. Grammar rules are used for action selection while specific set attributes are used to collect events (preconditions and effects of actions) that are ordered using a global timeline constraint. We show how classical STRIPS, hierarchical task networks, and procedural domain models are transformed to attribute grammars.

Proceedings ArticleDOI
23 Oct 2017
TL;DR: This work presents lock-free algorithms for concurrent attribute evaluation, enabling low latency in interactive tools, and has implemented these algorithms in Java, for the JastAdd metacompiler.
Abstract: Reference Attribute Grammars (RAGs) is a declarative executable formalism used for constructing compilers and related tools. Existing implementations support concurrent evaluation only with global evaluation locks. This may lead to long latencies in interactive tools, where interactive and background threads query attributes concurrently. We present lock-free algorithms for concurrent attribute evaluation, enabling low latency in interactive tools. Our algorithms support important extensions to RAGs like circular (fixed-point) attributes and higher-order attributes. We have implemented our algorithms in Java, for the JastAdd metacompiler. We evaluate the implementation on a JastAdd-specified compiler for the Java language, demonstrating very low latencies for interactive attribute queries, on the order of milliseconds. Furthermore, initial experiments show a speedup of about a factor 2 when using four parallel compilation threads.

Journal ArticleDOI
TL;DR: This work considers d-dimensional contextual array grammars and investigates their computational power when using various control mechanisms – matrices, regular control languages, and tissue P systems, which work like regular control Languages, but may end up with a final check for the non-applicability of some rules.

Posted Content
TL;DR: In the evaluation on inputs like URLs, spreadsheets, or configuration files, the AUTOGRAM prototype obtains input grammars that are both accurate and very readable - and that can be directly fed into test generators for comprehensive automated testing.
Abstract: Knowing the precise format of a program's input is a necessary prerequisite for systematic testing. Given a program and a small set of sample inputs, we (1) track the data flow of inputs to aggregate input fragments that share the same data flow through program execution into lexical and syntactic entities; (2) assign these entities names that are based on the associated variable and function identifiers; and (3) systematically generalize production rules by means of membership queries. As a result, we need only a minimal set of sample inputs to obtain human-readable context-free grammars that reflect valid input structure. In our evaluation on inputs like URLs, spreadsheets, or configuration files, our AUTOGRAM prototype obtains input grammars that are both accurate and very readable - and that can be directly fed into test generators for comprehensive automated testing.

Proceedings ArticleDOI
23 Oct 2017
TL;DR: It is shown that a useful class of language extensions, implemented as attribute grammars, preserve all coherent properties, and if extensions are restricted to only making use of coherent properties in establishing their correctness, then the correctness properties of each extension will hold when composed with other extensions.
Abstract: Extensible language frameworks aim to allow independently-developed language extensions to be easily added to a host programming language. It should not require being a compiler expert, and the resulting compiler should "just work" as expected. Previous work has shown how specifications for parsing (based on context free grammars) and for semantic analysis (based on attribute grammars) can be automatically and reliably composed, ensuring that the resulting compiler does not terminate abnormally. However, this work does not ensure that a property proven to hold for a language (or extended language) still holds when another extension is added, a problem we call interference. We present a solution to this problem using of a logical notion of coherence. We show that a useful class of language extensions, implemented as attribute grammars, preserve all coherent properties. If we also restrict extensions to only making use of coherent properties in establishing their correctness, then the correctness properties of each extension will hold when composed with other extensions. As a result, there can be no interference: each extension behaves as specified.

Posted Content
TL;DR: In this paper, the authors investigated multiple context-free tree grammars, where "simple" means linear and nondeleting, and showed that a tree language can be generated by a multiple context free tree grammar if and only if it is the image of a regular tree language under a deterministic finite-copying macro tree transducer.
Abstract: Multiple (simple) context-free tree grammars are investigated, where "simple" means "linear and nondeleting". Every multiple context-free tree grammar that is finitely ambiguous can be lexicalized; i.e., it can be transformed into an equivalent one (generating the same tree language) in which each rule of the grammar contains a lexical symbol. Due to this transformation, the rank of the nonterminals increases at most by 1, and the multiplicity (or fan-out) of the grammar increases at most by the maximal rank of the lexical symbols; in particular, the multiplicity does not increase when all lexical symbols have rank 0. Multiple context-free tree grammars have the same tree generating power as multi-component tree adjoining grammars (provided the latter can use a root-marker). Moreover, every multi-component tree adjoining grammar that is finitely ambiguous can be lexicalized. Multiple context-free tree grammars have the same string generating power as multiple context-free (string) grammars and polynomial time parsing algorithms. A tree language can be generated by a multiple context-free tree grammar if and only if it is the image of a regular tree language under a deterministic finite-copying macro tree transducer. Multiple context-free tree grammars can be used as a synchronous translation device.

Book ChapterDOI
11 Sep 2017
TL;DR: It is demonstrated that multiple simple context-free tree Grammars are as expressive as multi-component tree adjoining grammars and that both allow strong lexicalization.
Abstract: Strong lexicalization is the process of turning a grammar generating trees into an equivalent one, in which all rules contain a terminal leaf. It is known that tree adjoining grammars cannot be strongly lexicalized, whereas the more powerful simple context-free tree grammars can. It is demonstrated that multiple simple context-free tree grammars are as expressive as multi-component tree adjoining grammars and that both allow strong lexicalization.

Journal ArticleDOI
Ryo Yoshinaka1
TL;DR: This paper presents a distributional learning algorithm for conjunctive grammars with the k -finite context property ( k - fcp) for each natural number k and shows that every exact cbfg has the k- fcp, while not all of them are learnable by their algorithm.

Proceedings ArticleDOI
23 Oct 2017
TL;DR: This paper will show that SPEGs have improved the expressive power in such ways that they recognize practical context-sensitive grammars, including back referencing, indentation-based code layout, and contextual keywords.
Abstract: Parsing expression grammars (PEGs) are a powerful and popular foundation for describing syntax. Despite PEGs' expressiveness, they cannot recognize many syntax patterns of popular programming languages. Typical examples include typedef-defined names in C/C++ and here documents appearing in many scripting languages. We use a single unified state representation, called a symbol table, to capture various context-sensitive patterns. Over the symbol table, we design a small set of restricted semantic predicates and actions. The extended PEGs are called SPEGs, and are designed to be safe in contexts of backtracking and the linear time guarantee of packrat parsing. This paper will show that SPEGs have improved the expressive power in such ways that they recognize practical context-sensitive grammars, including back referencing, indentation-based code layout, and contextual keywords.

Journal ArticleDOI
TL;DR: It is proved in this paper that every language described by a grammar with contexts can be recognized in deterministic linear space.


DOI
01 Jan 2017
TL;DR: This work proposes a pipeline to automatically generate, cluster, and select a set of representative preview images for a grammar, and presents design transformations and grammar co-derivation to create new designs from existing ones, including fine-grained rule merging.
Abstract: Procedural shape grammars are powerful tools for the automatic generation of highly detailed 3D content from a set of descriptive rules. It is easy to encode variations in stochastic and parametric grammars, and an uncountable number of models can be generated quickly. While shape grammars offer these advantages over manual 3D modeling, they also suffer from certain drawbacks. We present three novel methods that address some of the limitations of shape grammars. First, it is often difficult to grasp the diversity of models defined by a given grammar. We propose a pipeline to automatically generate, cluster, and select a set of representative preview images for a grammar. The system is based on a new view attribute descriptor that measures how suitable an image is in representing a model and that enables the comparison of different models derived from the same grammar. Second, the default distribution of models in a stochastic grammar is often undesirable. We introduce a framework that allows users to design a new probability distribution for a grammar without editing the rules. Gaussian process regression interpolates user preferences from a set of scored models over an entire shape space. A symbol split operation enables the adaptation of the grammar to generate models according to the learned distribution. Third, it is hard to combine elements of two grammars to emerge new designs. We present design transformations and grammar co-derivation to create new designs from existing ones. Algorithms for fine-grained rule merging can generate a large space of design variations and can be used to create animated transformation sequences between different procedural designs. Our contributions to visualize, adapt, and transform grammars makes the procedural modeling methodology more accessible to non-programmers.

DOI
01 Oct 2017
TL;DR: This research focuses particularly on urban network design and road intersection grammars to validate proposed grammar evaluation methods, and generally shows that network topology and intersection type choice both depend on transport mode characteristics and flow.
Abstract: Grammars, with their generic approach and broad application potential in many planning fields, are accepted as adaptable and efficient tools for design and planning applications, bridging design rules and technical planning requirements. This paper provides a formal introduction of grammars for effective consolidation and application, including a rule-based notation and required specification information. Two proposed grammar evaluation methods – based on technical planning knowledge and using recent computational development – foster understanding of a grammar’s effects, often missing in other definitions. Knowledge gained enables efficient grammar rule application, e.g. in burgeoning planning software. This research focuses particularly on urban network design and road intersection grammars to validate proposed grammar evaluation methods. Results are specified in the proposed grammar notation with corresponding application specifications. Results generally show that network topology and intersection type choice both depend on transport mode characteristics and flow. Specifically, medium-dense gridiron networks are car-efficient in terms of travel costs and reliability at urban densities, when combined with high road and intersection capacities. Pedestrian networks ideally have higher intersection and road densities with lower capacities than car networks. Highly meshed networks improve overall travel cost efficiencies for all transport modes at various flow levels.

01 Jan 2017
TL;DR: This paper demonstrates this with a minimal example, discusses various suggestions from literature, and proposes a novel approach that can be used to address this shortcoming in the future.
Abstract: At least two actively developed model synchronization frameworks employ a conceptually similar algorithm based on Triple Graph Grammars as an underlying formalism. Although this algorithm exhibits acceptable behavior for many use cases, there are still scenarios in which it is sub-optimal, especially regarding the “least change” criterion, i.e., the extent to which models are changed to restore consistency. In this paper, we demonstrate this with a minimal example, discuss various suggestions from literature, and propose a novel approach that can be used to address this shortcoming in the future.

Posted Content
TL;DR: The problem of context-free grammars comparison can be reduced to numerical solution of systems of nonlinear matrix equations and forms a basis for probabilistic comparison algorithms oriented to automatic assessment of of student's answers in computer science.
Abstract: In this paper we consider the problem of context-free grammars comparison from the analysis point of view. We show that the problem can be reduced to numerical solution of systems of nonlinear matrix equations. The approach presented here forms a basis for probabilistic comparison algorithms oriented to automatic assessment of of student's answers in computer science.

Posted Content
TL;DR: A shape analysis for reasoning about relational properties of data structures, such as balancedness of trees or lengths of lists, based on user-defined indexed graph grammars to guide concretization and abstraction.
Abstract: The aim of shape analysis is to discover precise abstractions of the reachable data structures in a program's heap. This paper develops a shape analysis for reasoning about relational properties of data structures, such as balancedness of trees or lengths of lists. Both the concrete and the abstract domain are represented by hypergraphs. The analysis is based on user-defined indexed graph grammars to guide concretization and abstraction. This novel extension of context-free graph grammars is powerful enough to model complex data structures, such as balanced binary trees with parent pointers, while preserving most desirable properties of context-free graph grammars. One strength of our analysis is that no artifacts apart from grammars are required from the user; it thus offers a high degree of automation. In particular, indexed graph grammars naturally describe how a data structure evolves and require no deep knowledge about relational properties. We have implemented a prototype of our analysis and report on first experimental results.

Posted Content
TL;DR: The authors use macro grammars to cache the abstract patterns of useful logical forms found thus far, and holistic triggering to efficiently retrieve the most relevant patterns based on sentence similarity, achieving state-of-the-art accuracy on WikiTableQuestions.
Abstract: To learn a semantic parser from denotations, a learning algorithm must search over a combinatorially large space of logical forms for ones consistent with the annotated denotations. We propose a new online learning algorithm that searches faster as training progresses. The two key ideas are using macro grammars to cache the abstract patterns of useful logical forms found thus far, and holistic triggering to efficiently retrieve the most relevant patterns based on sentence similarity. On the WikiTableQuestions dataset, we first expand the search space of an existing model to improve the state-of-the-art accuracy from 38.7% to 42.7%, and then use macro grammars and holistic triggering to achieve an 11x speedup and an accuracy of 43.7%.

Journal ArticleDOI
TL;DR: This paper describes work on the automatic generation of incremental attribute grammar evaluators, with the purpose of (semi-)automatically generating an incremental compiler from a regular attribute grammar definition.