Showing papers on "Context-free grammar published in 2017"

PDF

Open Access

Journal Article•DOI•

Context-free grammars for permutations and increasing trees

[...]

William Y. C. Chen¹, Amy M. Fu²•Institutions (2)

01 Jan 2017-Advances in Applied Mathematics

TL;DR: The notion of a grammatical labeling to describe a recursive process of generating combinatorial objects based on a context-free grammar is introduced and it is demonstrated that Gessel's formula for the generating function of $T(n,k)$ can be deduced from this grammar.

...read moreread less

50 citations

Journal Article•DOI•

Computational learning of construction grammars

[...]

Jonathan Dunn

01 Jun 2017-Language and Cognition

TL;DR: This grammar induction algorithm has two goals: first, to show that construction grammars are learnable without highly specified innate structure; second, to develop a model of which units do or do not constitute constructions in a given dataset.

...read moreread less

Abstract: This paper presents an algorithm for learning the construction grammar of a language from a large corpus. This grammar induction algorithm has two goals: first, to show that construction grammars are learnable without highly specified innate structure; second, to develop a model of which units do or do not constitute constructions in a given dataset. The basic task of construction grammar induction is to identify the minimum set of constructions that represents the language in question with maximum descriptive adequacy. These constructions must (1) generalize across an unspecified number of units while (2) containing mixed levels of representation internally (e.g., both item-specific and schematized representations), and (3) allowing for unfilled and partially filled slots. Additionally, these constructions may (4) contain recursive structure within a given slot that needs to be reduced in order to produce a sufficiently schematic representation. In other words, these constructions are multi-length, multi-level, possibly discontinuous co-occurrences which generalize across internal recursive structures. These co-occurrences are modeled using frequency and the ΔP measure of association, expanded in novel ways to cover multi-unit sequences. This work provides important new evidence for the learnability of construction grammars as well as a tool for the automated corpus analysis of constructions.

...read moreread less

39 citations

Proceedings Article•DOI•

Application of sentence parsing for determining keywords in Ukrainian texts

[...]

Lytvyn Vasyl¹, Vysotska Victoria¹, Dosyn Dmytro¹, Holoschuk Roman¹, Rybchak Zoriana¹ - Show less +1 more•Institutions (1)

Lviv Polytechnic¹

01 Sep 2017

TL;DR: The article features the process of introducing new restrictions on these grammar classes through the introduction of new rules, and reveals the features of synthesizing sentences of different languages with the use of generative grammars.

...read moreread less

Abstract: The article presents the use of generative grammars in linguistic modeling. A description of sentence syntax modeling is used to automate the process of analysis and synthesis of natural language texts. The article reveals the features of synthesizing sentences of different languages with the use of generative grammars. The article examines influence of norms and rules of a language on the process of constructing grammars. The use of generative grammars has great potential in the development and creation of automated systems for text content processing, linguistic support for linguistic computer systems etc. In natural languages there are situations where notions, which are dependent on the context, are described as independent of context, i.e. in terms of context-free grammars. This description is complicated by the formation of new categories and rules. The article features the process of introducing new restrictions on these grammar classes through the introduction of new rules. Uncut grammars were received if the number of characters in the right part of the rules were not less than the number of characters in the left one. Then by replacing the only character a context-sensitive grammar was received. A grammar with only one character in the left part of the rule is called a context-free grammar. No further natural restrictions may be applied to the left part of a rule. Based on the importance of automatic processing of text content in modern information media (e.g., information retrieval systems, machine translation, semantic, statistical, optical and acoustic analysis and speech synthesis, automated editing, extracting knowledge from text content, abstracting and annotating text content, indexing text content, teaching and didactic, management of linguistic corpora, various tools for lexicography, etc.), specialists are actively looking for new models, ways of their description and methods of automatic processing of text content. One of such methods lies in developing general principles of syntactic lexicographical systems formation and developing mentioned systems of processing text content for specific languages based in these principles. Any parsing tools consist of two parts: a knowledge base of concrete natural language and parsing algorithm, i.e. a set of standard operators of text content processing based on this knowledge. The source of grammatical knowledge is data of morphological analysis and various tables filled with concepts and linguistic units. They are the result of an empirical study of the text content in natural language by experts aiming at highlighting the basic laws for parsing.

...read moreread less

36 citations

Proceedings Article•DOI•

Mining input grammars with AUTOGRAM

[...]

Matthias Höschele¹, Andreas Zeller¹•Institutions (1)

Saarland University¹

20 May 2017

TL;DR: The AUTOGRAM tool uses dynamic tainting to trace the data flow of each input character for a set of sample inputs and identifies syntactical entities by grouping input fragments that are handled by the same functions.

...read moreread less

Abstract: Knowledge about how a program processes its inputs can help to understand the structure of the input as well as the structure of the program In a JSON value like [1, true, "Alice"], for instance the integer value 1, the boolean value true and the string value "Alice" would be handled by different functions or stored in different variables Our AUTOGRAM tool uses dynamic tainting to trace the data flow of each input character for a set of sample inputs and identifies syntactical entities by grouping input fragments that are handled by the same functions The resulting context-free grammar reflects the structure of valid inputs and can be used for reverse engineering of formats and can serve as direct input for test generators A video demonstrating AUTOGRAM is available at https://youtube/Iqym60iWBBk

...read moreread less

31 citations

Book Chapter•DOI•

The Grail Theorem Prover: Type Theory for Syntax and Semantics

[...]

Richard Moot

01 Jan 2017-arXiv: Computation and Language

TL;DR: This chapter gives a high-level description of a family of theorem provers designed for grammar development in a variety of modern type-logical grammars, including a graph-theoretic way to represent (partial) proofs during proof search.

...read moreread less

Abstract: Type-logical grammars use a foundation of logic and type theory to model natural language. These grammars have been particularly successful giving an account of several well-known phenomena on the syntax-semantics interface, such as quantifier scope and its interaction with other phenomena. This chapter gives a high-level description of a family of theorem provers designed for grammar development in a variety of modern type-logical grammars. We discuss automated theorem proving for type-logical grammars from the perspective of proof nets, a graph-theoretic way to represent (partial) proofs during proof search.

...read moreread less

21 citations

Journal Article•DOI•

Learning daily activity patterns with probabilistic grammars

[...]

Siyu Li¹, Der-Horng Lee¹•Institutions (1)

National University of Singapore¹

01 Jan 2017-Transportation

TL;DR: Comparisons between daily activity pattern—which is defined as activity sequence—and language are explored and the proposed methodology sheds light on the issue of generating stochastic and accessibility-dependent choice sets for daily activity patterns models in certain activity-based modeling frameworks.

...read moreread less

Abstract: Daily activity pattern is the reflection and abstraction of actual individual activity participation on daily basis. It carries information on activity type, frequency and sequence. Preference of daily activity patterns varies among population, and thus can be interpreted as personal life styles. This paper advances studies on human daily activity patterns by providing new perspective and methodology in the modeling and learning of daily activity patterns using probabilistic context-free grammars. In this paper, similarities between daily activity pattern—which is defined as activity sequence—and language are explored. We developed context-free grammars to parse and generate daily activity patterns. To replicate people’s heterogeneity in selecting daily activity patterns, we introduced probabilistic context-free grammars and proposed several formulations to estimate the probability of a context-free grammar with daily activity patterns observed in household travel survey. We conducted experiments on the proposed formulations, finding that under proper context-free grammar and problem formulation, the estimated probabilistic context-free grammar is able to reproduce the observed pattern distribution in household travel survey with satisfactory precision. Practically, the proposed methodology sheds light on the issue of generating stochastic and accessibility-dependent choice sets for daily activity pattern models in certain activity-based modeling frameworks.

...read moreread less

19 citations

Journal Article•DOI•

Generalized LR Parsing Algorithm for Grammars with One-Sided Contexts

[...]

Mikhail Barash¹, Mikhail Barash², Alexander Okhotin¹•Institutions (2)

University of Turku¹, Turku Centre for Computer Science²

01 Aug 2017-Theory of Computing Systems \/ Mathematical Systems Theory

TL;DR: The Generalized LR parsing algorithm is extended to the case of “grammars with left contexts” and has the same worst-case cubic-time performance as in the cases of context-free grammars.

...read moreread less

Abstract: The Generalized LR parsing algorithm for context-free grammars is notable for having a decent worst-case running time (cubic in the length of the input string, if implemented efficiently), as well as much better performance on “good” grammars. This paper extends the Generalized LR algorithm to the case of “grammars with left contexts” (M. Barash, A. Okhotin, “An extension of context-free grammars with one-sided context specifications”, Inform. Comput., 2014), which augment the context-free grammars with special operators for referring to the left context of the current substring, along with a conjunction operator (as in conjunctive grammars) for combining syntactical conditions. All usual components of the LR algorithm, such as the parsing table, shift and reduce actions, etc., are extended to handle the context operators. The resulting algorithm is applicable to any grammar with left contexts and has the same worst-case cubic-time performance as in the case of context-free grammars.

...read moreread less

14 citations

Proceedings Article•DOI•

Synthesis of social media profiles using a probabilistic context-free grammar

[...]

Abejide Ade-Ibijola¹•Institutions (1)

University of Johannesburg¹

01 Nov 2017

TL;DR: This paper presents a new application of a type of formal grammar — probabilistic/stochastic context-free grammar — in the automatic generation of social media profiles using Facebook as a test case, and describes the implementation and results.

...read moreread less

Abstract: One helpful resource to have when presenting delicate/sensitive information is hypothetical data, or placeholders that conceal the identity of concerned parties. This is crucial in environments such as medicine and criminology as volunteers in medical research, patients of dreaded diseases, and convicts of certain crimes often prefer to remain anonymous, even when they agree to their records being shared. Recently, research based on social media has raised similar ethical concerns about privacy and the use of real users' profiles. In this paper, we present a new application of a type of formal grammar — probabilistic/stochastic context-free grammar — in the automatic generation of social media profiles using Facebook as a test case. First, we present a grammar-based formalism for describing the rules governing the formulation of reasonable user attributes (e.g. full names, date of birth, addresses, phone numbers, etc). These grammar rules are specified with associated probabilistic weights that decides when (if at all) a rule is used or chosen. Secondly, we describe the implementation of these grammar rules. Our implementation results produced one million iterations of unique Facebook profiles within three hours of execution time — with an almost-impossible probability that a profile will reoccur. 100,000 of these synthesised profiles can be viewed at: tinyurl.com/synthesisedprofiles2017. These profiles may find applications in role-playing games in health, and social media research; and the described technique may find a wider application in generation of hypothetical profiles for data anonymisation in different domains.

...read moreread less

14 citations

Journal Article•DOI•

Accurate Maximum-Margin Training for Parsing With Context-Free Grammars

[...]

Alexander Bauer¹, Mikio L. Braun¹, Klaus-Robert Müller¹•Institutions (1)

Technical University of Berlin¹

01 Jan 2017-IEEE Transactions on Neural Networks

TL;DR: This paper derives an extension of the well-known Cocke-Kasami-Younger algorithm used for parsing with probabilistic context-free grammars for the case of loss-augmented inference enabling an effective training in the cutting-plane approach.

...read moreread less

Abstract: The task of natural language parsing can naturally be embedded in the maximum-margin framework for structured output prediction using an appropriate joint feature map and a suitable structured loss function. While there are efficient learning algorithms based on the cutting-plane method for optimizing the resulting quadratic objective with potentially exponential number of linear constraints, their efficiency crucially depends on the inference algorithms used to infer the most violated constraint in a current iteration. In this paper, we derive an extension of the well-known Cocke–Kasami–Younger (CKY) algorithm used for parsing with probabilistic context-free grammars for the case of loss-augmented inference enabling an effective training in the cutting-plane approach. The resulting algorithm is guaranteed to find an optimal solution in polynomial time exceeding the running time of the CKY algorithm by a term, which only depends on the number of possible loss values. In order to demonstrate the feasibility of the presented algorithm, we perform a set of experiments for parsing English sentences.

...read moreread less

12 citations

Book Chapter•DOI•

Using Personal Information in Targeted Grammar-Based Probabilistic Password Attacks

[...]

Shiva Houshmand¹, Sudhir Aggarwal¹•Institutions (1)

Southern Illinois University Carbondale¹

30 Jan 2017

TL;DR: A dictionary-based probabilistic context-free grammar approach is proposed that effectively incorporates personal information about a targeted user into component grammars and dictionaries used for password cracking that significantly improves password cracking performance.

...read moreread less

Abstract: Passwords are the primary means of authentication and security for online accounts and are commonly used to encrypt files and disks. This research demonstrates how personal information about users can be added systematically to enhance password cracking. Specifically, a dictionary-based probabilistic context-free grammar approach is proposed that effectively incorporates personal information about a targeted user into component grammars and dictionaries used for password cracking. The component grammars model various types of personal information such as family names and dates, previous password information and possible information about sequential passwords. A mathematical model for merging multiple grammars that combines the characteristics of the component grammars is presented. The resulting merged target grammar, which is also merged with a standard grammar, is used along with various dictionaries to generate guesses that quickly match target passwords. The experimental results demonstrate that the approach significantly improves password cracking performance.

...read moreread less

10 citations

Probabilistic Context-Free Grammars.

[...]

Yasubumi Sakakibara

01 Jan 2017

Book Chapter•DOI•

Fusion Grammars: A Novel Approach to the Generation of Graph Languages

[...]

Hans-Jörg Kreowski¹, Sabine Kuske¹, Aaron Lye¹•Institutions (1)

University of Bremen¹

18 Jul 2017

TL;DR: The notion of fusion grammars as a novel device for the generation of (hyper)graph languages is introduced and it is shown that fusion Grammars can simulate hyperedge replacement grammARS that generate connected hypergraphs, that the membership problem is decidable, and that fusiongrammars are more powerful than hyperedGE replacement gramMars.

...read moreread less

Abstract: In this paper, we introduce the notion of fusion grammars as a novel device for the generation of (hyper)graph languages. Fusion grammars are motivated by the observation that many large and complex structures can be seen as compositions of a large number of small basic pieces. A fusion grammar is a hypergraph grammar that provides the small pieces as connected components of the start hypergraph. To get arbitrary large numbers of them, they can be copied multiple times. To get large connected hypergraphs, they can be fused by the application of fusion rules. As the first main results, we show that fusion grammars can simulate hyperedge replacement grammars that generate connected hypergraphs, that the membership problem is decidable, and that fusion grammars are more powerful than hyperedge replacement grammars.

...read moreread less

Journal Article•DOI•

Hybrid grammars for parsing of discontinuous phrase structures and non-projective dependency structures

[...]

Kilian Gebhardt¹, Mark-Jan Nederhof², Heiko Vogler¹•Institutions (2)

Dresden University of Technology¹, University of St Andrews²

15 Sep 2017-Computational Linguistics

TL;DR: The main advantage over existing frameworks is the ability of hybrid grammars to separate discontinuity of the desired structures from time complexity of parsing, which permits exploration of a large variety of parsing algorithms for discontinuous structures, with different properties.

...read moreread less

Abstract: We explore the concept of hybrid grammars, which formalize and generalize a range of existing frameworks for dealing with discontinuous syntactic structures. Covered are both discontinuous phrase structures and non-projective dependency structures. Technically, hybrid grammars are related to synchronous grammars, where one grammar component generates linear structures and another generates hierarchical structures. By coupling lexical elements of both components together, discontinuous structures result. Several types of hybrid grammars are characterized. We also discuss grammar induction from treebanks. The main advantage over existing frameworks is the ability of hybrid grammars to separate discontinuity of the desired structures from time complexity of parsing. This permits exploration of a large variety of parsing algorithms for discontinuous structures, with different properties. This is confirmed by the reported experimental results, which show a wide variety of running time, accuracy, and frequency ...

...read moreread less

Proceedings Article•DOI•

Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!

[...]

Rajesh Jayaram¹, Barna Saha²•Institutions (2)

Carnegie Mellon University¹, University of Massachusetts Amherst²

01 Jan 2017

TL;DR: Additive approximation algorithms for language edit distance are studied, providing two explicit combinatorial algorithms to obtain a string with minimum edit distance with performance dependencies on either the number of non-linear productions, k^*, or theNumber of nested non- linear production, k, used in the optimal derivation.

...read moreread less

Abstract: In 1975, a breakthrough result of L. Valiant showed that parsing context free grammars can be reduced to Boolean matrix multiplication, resulting in a running time of O(n^omega) for parsing where omega <= 2.373 is the exponent of fast matrix multiplication, and n is the string length. Recently, Abboud, Backurs and V. Williams (FOCS 2015) demonstrated that this is likely optimal; moreover, a combinatorial o(n^3) algorithm is unlikely to exist for the general parsing problem. The language edit distance problem is a significant generalization of the parsing problem, which computes the minimum edit distance of a given string (using insertions, deletions, and substitutions) to any valid string in the language, and has received significant attention both in theory and practice since the seminal work of Aho and Peterson in 1972. Clearly, the lower bound for parsing rules out any algorithm running in o(n^omega) time that can return a nontrivial multiplicative approximation of the language edit distance problem. Furthermore, combinatorial algorithms with cubic running time or algorithms that use fast matrix multiplication are often not desirable in practice. To break this n^omega hardness barrier, in this paper we study additive approximation algorithms for language edit distance. We provide two explicit combinatorial algorithms to obtain a string with minimum edit distance with performance dependencies on either the number of non-linear productions, k^*, or the number of nested non-linear production, k, used in the optimal derivation. Explicitly, we give an additive O(k^*gamma) approximation in time O(|G|(n^2 + (n/gamma)^3)) and an additive O(k gamma) approximation in time O(|G|(n^2 + (n^3/gamma^2))), where |G| is the grammar size and n is the string length. In particular, we obtain tight approximations for an important subclass of context free grammars known as ultralinear grammars, for which k and k^* are naturally bounded. Interestingly, we show that the same conditional lower bound for parsing context free grammars holds for the class of ultralinear grammars as well, clearly marking the boundary where parsing becomes hard!

...read moreread less

Journal Article•DOI•

Computational Learning of Syntax

[...]

Alexander Clark¹•Institutions (1)

King's College London¹

18 Jan 2017-Social Science Research Network

TL;DR: The computational issues involved in learning hierarchically structured grammars from strings of symbols alone are discussed and methods based on an abstract notion of the derivational context of a syntactic category lead to learning algorithms based on a form of traditional distributional analysis.

...read moreread less

Abstract: Learnability has traditionally been considered to be a crucial constraint on theoretical syntax; however, the issues involved have been poorly understood, partly as a result of the lack of simple learning algorithms for various types of formal grammars. Here I discuss the computational issues involved in learning hierarchically structured grammars from strings of symbols alone. The methods involved are based on an abstract notion of the derivational context of a syntactic category, which in the most elementary case of context-free grammars leads to learning algorithms based on a form of traditional distributional analysis. Crucially, these techniques can be extended to work with mildly context-sensitive grammars (and beyond), thus leading to learning methods that can in principle learn classes of grammars that are powerful enough to represent all natural languages. These learning methods require that the syntactic categories of the grammars be visible in a certain technical sense: They must be well charac...

...read moreread less

Journal Article•DOI•

Algorithmic Compression of Finite Tree Languages by Rigid Acyclic Grammars

[...]

Sebastian Eberhard¹, Gabriel Ebner¹, Stefan Hetzl¹•Institutions (1)

Vienna University of Technology¹

21 Sep 2017-ACM Transactions on Computational Logic

TL;DR: An algorithm to optimally compress a finite set of terms using a vectorial totally rigid acyclic tree grammar, based on a polynomial-time reduction to the MaxSAT optimization problem.

...read moreread less

Abstract: We present an algorithm to optimally compress a finite set of terms using a vectorial totally rigid acyclic tree grammar. This class of grammars has a tight connection to proof theory, and the grammar compression problem considered in this article has applications in automated deduction. The algorithm is based on a polynomial-time reduction to the MaxSAT optimization problem. The crucial step necessary to justify this reduction consists of applying a term rewriting relation to vectorial totally rigid acyclic tree grammars. Our implementation of this algorithm performs well on a large real-world dataset.

...read moreread less

Book Chapter•DOI•

Lightweight Spoken Utterance Classification with CFG, tf-idf and Dynamic Programming

[...]

Manny Rayner¹, Nikos Tsourakis¹, Johanna Gerlach¹•Institutions (1)

University of Geneva¹

23 Oct 2017

TL;DR: This work describes a simple spoken utterance classification method suitable for data-sparse domains which can be approximately described by CFG grammars, and presents results of experiments carried out on a substantial CFG-based medical speech translator and the publicly available Spoken CALL Shared Task.

...read moreread less

Abstract: We describe a simple spoken utterance classification method suitable for data-sparse domains which can be approximately described by CFG grammars. The central idea is to perform robust matching of CFG rules against output from a large-vocabulary recogniser, using a dynamic programming method which optimises the tf-idf score of the matched grammar string. We present results of experiments carried out on a substantial CFG-based medical speech translator and the publicly available Spoken CALL Shared Task. Robust utterance classification using the tf-idf method strongly outperforms plain CFG-based recognition for both domains. When comparing with Naive Bayes classifiers trained on data sampled from the CFG grammars, the tf-idf/dynamic programming method is much better on the complex speech translation domain, but worse on the simple Spoken CALL Shared Task domain.

...read moreread less

Proceedings Article•DOI•

Generating context-free grammars using classical planning

[...]

Javier Segovia-Aguas¹, Sergio Jiménez², Anders Jonsson¹•Institutions (2)

Pompeu Fabra University¹, Polytechnic University of Valencia²

01 Aug 2017

TL;DR: Comunicacio presentada a: IJCAI International Joint Conference on Artificial Intelligence, celebrada a Melbourne, Australia, del 19 al 25 d'agost de 2017.

...read moreread less

Abstract: Comunicacio presentada a: IJCAI International Joint Conference on Artificial Intelligence, celebrada a Melbourne, Australia, del 19 al 25 d'agost de 2017.

...read moreread less

Journal Article•DOI•

Joint perceptual learning and natural language acquisition for autonomous robots

[...]

Muhannad Al-Omari¹•Institutions (1)

University of Leeds¹

01 Aug 2017-Artificial Intelligence

TL;DR: The problem of bootstrapping knowledge in language and vision for autonomous robots is addressed through novel techniques in grammar induction and word grounding to the perceptual world through a cognitively plausible loosely-supervised manner from raw linguistic and visual data.

...read moreread less

Journal Article•DOI•

Contextual array grammars with matrix control, regular control languages, and tissue P systems control

[...]

Artiom Alhazov¹, Henning Fernau², Rudolf Freund³, Sergiu Ivanov⁴, Rani Siromoney⁵, K. G. Subramanian⁶ - Show less +2 more•Institutions (6)

Academy of Sciences of Moldova¹, University of Trier², Vienna University of Technology³, University of Paris⁴, Chennai Mathematical Institute⁵, Liverpool Hope University⁶

19 Jun 2017-Theoretical Computer Science

TL;DR: This work considers d-dimensional contextual array grammars and investigates their computational power when using various control mechanisms – matrices, regular control languages, and tissue P systems, which work like regular control Languages, but may end up with a final check for the non-applicability of some rules.

...read moreread less

Book Chapter•DOI•

Parsing and Printing of and with Triples

[...]

Sebastiaan J. C. Joosten

15 May 2017

TL;DR: The tool Amperspiegel is introduced, which uses triple graphs for parsing, printing and manipulating data, and how to conveniently encode parsers, graph manipulation-rules, and printers using several relations is shown.

...read moreread less

Abstract: We introduce the tool Amperspiegel, which uses triple graphs for parsing, printing and manipulating data. We show how to conveniently encode parsers, graph manipulation-rules, and printers using several relations. As such, parsers, rules and printers are all encoded as graphs themselves. This allows us to parse, manipulate and print these parsers, rules and printers within the system. A parser for a context free grammar is graph-encoded with only four relations. The graph manipulation-rules turn out to be especially helpful when parsing. The printers strongly correspond to the parsers, being described using only five relations. The combination of parsers, rules and printers allows us to extract Ampersand source code from ArchiMate XML documents. Amperspiegel was originally developed to aid in the development of Ampersand.

...read moreread less

Posted Content•

Active Learning of Input Grammars

[...]

Matthias Höschele, Alexander Kampmann, Andreas Zeller

29 Aug 2017-arXiv: Programming Languages

TL;DR: In the evaluation on inputs like URLs, spreadsheets, or configuration files, the AUTOGRAM prototype obtains input grammars that are both accurate and very readable - and that can be directly fed into test generators for comprehensive automated testing.

...read moreread less

Abstract: Knowing the precise format of a program's input is a necessary prerequisite for systematic testing. Given a program and a small set of sample inputs, we (1) track the data flow of inputs to aggregate input fragments that share the same data flow through program execution into lexical and syntactic entities; (2) assign these entities names that are based on the associated variable and function identifiers; and (3) systematically generalize production rules by means of membership queries. As a result, we need only a minimal set of sample inputs to obtain human-readable context-free grammars that reflect valid input structure. In our evaluation on inputs like URLs, spreadsheets, or configuration files, our AUTOGRAM prototype obtains input grammars that are both accurate and very readable - and that can be directly fed into test generators for comprehensive automated testing.

...read moreread less

Proceedings Article•DOI•

A Probabilistic Context-Free Grammar Based Random Test Program Generation

[...]

Ondrej Cekan¹, Zdenek Kotasek¹•Institutions (1)

Brno University of Technology¹

01 Aug 2017

TL;DR: The aim of this paper is to show the use of a probabilistic context-free grammar in the domain of stimulus generation, especially random test program generation for processors, and demonstrate that this approach is competitive with a conventional approach.

...read moreread less

Abstract: The aim of this paper is to show the use of a probabilistic context-free grammar in the domain of stimulus generation, especially random test program generation for processors. Nowadays, the randomly constructed test stimuli are largely applied in functional verification to verify the proper design and final implementation of systems. Context-free grammar cannot be used by itself in this case, because conditions for instructions of the program are changing during the generation. Therefore, there is a need to introduce additional logic in the form of constraints. Constraints guarantee the continuous changes of probabilities in the grammar and their application in order to preserve the validity of the program. The use of the grammar system provides a formal description of the stimuli, while the connection with constraints allows for the wide use in various systems. Experiments demonstrate that this approach is competitive with a conventional approach.

...read moreread less

Proceedings Article•DOI•

Ensuring non-interference of composable language extensions

[...]

Ted Kaminski¹, Eric Van Wyk¹•Institutions (1)

University of Minnesota¹

23 Oct 2017

TL;DR: It is shown that a useful class of language extensions, implemented as attribute grammars, preserve all coherent properties, and if extensions are restricted to only making use of coherent properties in establishing their correctness, then the correctness properties of each extension will hold when composed with other extensions.

...read moreread less

Abstract: Extensible language frameworks aim to allow independently-developed language extensions to be easily added to a host programming language. It should not require being a compiler expert, and the resulting compiler should "just work" as expected. Previous work has shown how specifications for parsing (based on context free grammars) and for semantic analysis (based on attribute grammars) can be automatically and reliably composed, ensuring that the resulting compiler does not terminate abnormally. However, this work does not ensure that a property proven to hold for a language (or extended language) still holds when another extension is added, a problem we call interference. We present a solution to this problem using of a logical notion of coherence. We show that a useful class of language extensions, implemented as attribute grammars, preserve all coherent properties. If we also restrict extensions to only making use of coherent properties in establishing their correctness, then the correctness properties of each extension will hold when composed with other extensions. As a result, there can be no interference: each extension behaves as specified.

...read moreread less

Parsing Minimalist Languages with Interpreted Regular Tree Grammars

[...]

Meaghan Fowlie, Alexander Koller

01 Sep 2017

TL;DR: By approaching minimalist grammars from the perspective of Interpreted Regular Tree Grammars, it is shown that standard chart-based parsing is substantially computationally cheaper than previously thought at O(n2k+3 · 2k).

...read moreread less

Abstract: Minimalist Grammars (MGs) (Stabler, 1997) are a formalisation of Chomsky’s minimalist program (Chomsky, 1995), which currently dominates much of mainstream syntax. MGs are simple and intuitive to work with, and are mildly context sensitive (Michaelis, 1998), putting them in the right general class for human language (Joshi, 1985).1 Minimalist Grammars are known to be more succinct than their Multiple ContextFree equivalents (Stabler, 2013), to have regular derivation tree languages (Kobele et al., 2007), and to be recognisable in polynomial time (Harkema, 2001) with a bottom-up CKY-like parser. However, the polynomial is large, O(n4k+4) where k is a grammar constant. By approaching minimalist grammars from the perspective of Interpreted Regular Tree Grammars, we show that standard chart-based parsing is substantially computationally cheaper than previously thought at O(n2k+3 · 2k).

...read moreread less

Posted Content•

Multiple Context-Free Tree Grammars: Lexicalization and Characterization

[...]

Joost Engelfriet¹, Andreas Maletti², Sebastian Maneth³•Institutions (3)

Leiden University¹, Leipzig University², University of Bremen³

11 Jul 2017-arXiv: Formal Languages and Automata Theory

TL;DR: In this paper, the authors investigated multiple context-free tree grammars, where "simple" means linear and nondeleting, and showed that a tree language can be generated by a multiple context free tree grammar if and only if it is the image of a regular tree language under a deterministic finite-copying macro tree transducer.

...read moreread less

Abstract: Multiple (simple) context-free tree grammars are investigated, where "simple" means "linear and nondeleting". Every multiple context-free tree grammar that is finitely ambiguous can be lexicalized; i.e., it can be transformed into an equivalent one (generating the same tree language) in which each rule of the grammar contains a lexical symbol. Due to this transformation, the rank of the nonterminals increases at most by 1, and the multiplicity (or fan-out) of the grammar increases at most by the maximal rank of the lexical symbols; in particular, the multiplicity does not increase when all lexical symbols have rank 0. Multiple context-free tree grammars have the same tree generating power as multi-component tree adjoining grammars (provided the latter can use a root-marker). Moreover, every multi-component tree adjoining grammar that is finitely ambiguous can be lexicalized. Multiple context-free tree grammars have the same string generating power as multiple context-free (string) grammars and polynomial time parsing algorithms. A tree language can be generated by a multiple context-free tree grammar if and only if it is the image of a regular tree language under a deterministic finite-copying macro tree transducer. Multiple context-free tree grammars can be used as a synchronous translation device.

...read moreread less

Journal Article•DOI•

Genetic Algorithm for Grammar Induction and Rules Verification through a PDA Simulator

[...]

Hari Mohan Pandey¹•Institutions (1)

Amity University¹

01 Sep 2017-IAES International Journal of Artificial Intelligence

TL;DR: Overall, a grammatical inference system has been developed that employs a PDA simulator for verification and runs in two phases: first, generation of grammar rules and verification and then applies the GA’s operation to optimize the rules.

...read moreread less

Abstract: The focus of this paper is towards developing a grammatical inference system uses a genetic algorithm (GA), has a powerful global exploration capability that can exploit the optimum offspring. The implemented system runs in two phases: first, generation of grammar rules and verification and then applies the GA’s operation to optimize the rules. A pushdown automata simulator has been developed, which parse the training data over the grammar’s rules. An inverted mutation with random mask and then ‘XOR’ operator has been applied introduces diversity in the population, helps the GA not to get trapped at local optimum. Taguchi method has been incorporated to tune the parameters makes the proposed approach more robust, statistically sound and quickly convergent. The performance of the proposed system has been compared with: classical GA, random offspring GA and crowding algorithms. Overall, a grammatical inference system has been developed that employs a PDA simulator for verification.

...read moreread less

Journal Article•DOI•

Distributional learning of conjunctive grammars and contextual binary feature grammars

[...]

Ryo Yoshinaka¹•Institutions (1)

Kyoto University¹

04 Sep 2017-Journal of Computer and System Sciences

TL;DR: This paper presents a distributional learning algorithm for conjunctive grammars with the k -finite context property ( k - fcp) for each natural number k and shows that every exact cbfg has the k- fcp, while not all of them are learnable by their algorithm.

...read moreread less

Book Chapter•DOI•

Chapter Five – A Streaming Dataflow Implementation of Parallel Cocke–Younger–Kasami Parser

[...]

Dragan Bojic¹, Miroslav Bojovic¹•Institutions (1)

University of Belgrade¹

01 Jan 2017-Advances in Computers

TL;DR: A novel, efficient streaming dataflow implementation of the CYK algorithm on reconfigurable hardware (Maxeler dataflow engines), which achieves 18–76 × speedup over an optimized sequential implementation for real-life grammars for natural language processing, depending on the length of the input string.

...read moreread less

Abstract: Parsing is the task of analyzing grammatical structures of an input sentence and deriving its parse tree. Efficient solutions for parsing are needed in many applications such as natural language processing, bioinformatics, and pattern recognition. The Cocke–Younger–Kasami (CYK) algorithm is a well-known parsing algorithm that operates on context-free grammars in Chomsky normal form and has been extensively studied for execution on parallel machines. In this chapter, we analyze the parallelizing opportunities for the CYK algorithm and give an overview of existing implementations on different hardware architectures. We propose a novel, efficient streaming dataflow implementation of the CYK algorithm on reconfigurable hardware (Maxeler dataflow engines), which achieves 18–76 × speedup over an optimized sequential implementation for real-life grammars for natural language processing, depending on the length of the input string.

...read moreread less

Book Chapter•DOI•

A Language-Theoretic View on Network Protocols

[...]

Pierre Ganty¹, Boris Köpf¹, Pedro Valero², Pedro Valero¹•Institutions (2)

IMDEA¹, Technical University of Madrid²

03 Oct 2017

TL;DR: The first line of defense against malformed or malicious inputs as mentioned in this paper is to ensure that the validator (which is often part of the parser) is free of bugs in the parser.

...read moreread less

Abstract: Input validation is the first line of defense against malformed or malicious inputs It is therefore critical that the validator (which is often part of the parser) is free of bugs

...read moreread less