Showing papers on "Context-free grammar published in 2020"

PDF

Open Access

Proceedings Article•DOI•

Mining input grammars from dynamic control flow

[...]

Rahul Gopinath, Björn Mathis, Andreas Zeller

08 Nov 2020

TL;DR: A general algorithm that takes a program and a small set of sample inputs and automatically infers a readable context-free grammar capturing the input language of the program, and works entirely without program specific heuristics.

...read moreread less

Abstract: One of the key properties of a program is its input specification. Having a formal input specification can be critical in fields such as vulnerability analysis, reverse engineering, software testing, clone detection, or refactoring. Unfortunately, accurate input specifications for typical programs are often unavailable or out of date. In this paper, we present a general algorithm that takes a program and a small set of sample inputs and automatically infers a readable context-free grammar capturing the input language of the program. We infer the syntactic input structure only by observing access of input characters at different locations of the input parser. This works on all stack based recursive descent input parsers, including parser combinators, and works entirely without program specific heuristics. Our Mimid prototype produced accurate and readable grammars for a variety of evaluation subjects, including complex languages such as JSON, TinyC, and JavaScript.

...read moreread less

33 citations

Journal Article•DOI•

The Return of Lexical Dependencies: Neural Lexicalized PCFGs

[...]

Hao Zhu¹, Yonatan Bisk², Graham Neubig¹•Institutions (2)

Carnegie Mellon University¹, University of Washington²

12 Nov 2020-Transactions of the Association for Computational Linguistics

TL;DR: Novel neural models of lexicalized PCFGs are presented that allow us to overcome sparsity problems and effectively induce both constituents and dependencies within a single model and results in stronger results on both representations than achieved when modeling either formalism alone.

...read moreread less

Abstract: In this paper we demonstrate that context free grammar (CFG) based methods for grammar induction benefit from modeling lexical dependencies . This contrasts to the most popular current methods for grammar induction, which focus on discovering either constituents or dependencies. Previous approaches to marry these two disparate syntactic formalisms (e.g. lexicalized PCFGs) have been plagued by sparsity, making them unsuitable for unsupervised grammar induction. However, in this work, we present novel neural models of lexicalized PCFGs which allow us to overcome sparsity problems and effectively induce both constituents and dependencies within a single model. Experiments demonstrate that this unified framework results in stronger results on both representations than achieved when modeling either formalism alone. Code is available at https://github.com/neulab/neural-lpcfg .

...read moreread less

29 citations

Proceedings Article•DOI•

Block-based syntax from context-free grammars

[...]

Mauricio Verano Merino¹, Tijs van der Storm²•Institutions (2)

Eindhoven University of Technology¹, University of Groningen²

16 Nov 2020

TL;DR: Kogi as mentioned in this paper is a tool for deriving block-based environments from context-free grammars, which can be used to define abstract structures for describing blockbased environments.

...read moreread less

Abstract: Block-based programming systems employ a jigsaw metaphor to write programs. They are popular in the domain of programming education (e.g., Scratch), but also used as a programming interface for end-users in other disciplines, such as arts, robotics, and configuration management. In particular, block-based environments promise a convenient interface for Domain-Specific Languages (DSLs) for domain experts who might lack a traditional programming education. However, building a block-based environment for a DSL from scratch requires significant effort. This paper presents an approach to engineer block-based language interfaces by reusing existing language artifacts. We present Kogi, a tool for deriving block-based environments from context-free grammars. We identify and define the abstract structure for describing block-based environments. Kogi transforms a context-free grammar into this structure, which then generates a block-based environment based on Google Blockly. The approach is illustrated with four case studies, a DSL for state machines, Sonification Blocks (a DSL for sound synthesis), Pico (a simple programming language), and QL (a DSL for questionnaires). The results show that usable block-based environments can be derived from context-free grammars, and with an order of magnitude reduction in effort.

...read moreread less

13 citations

Posted Content•

Provably Stable Interpretable Encodings of Context Free Grammars in RNNs with a Differentiable Stack.

[...]

John Stogin, Ankur Mali¹, C. Lee Giles¹•Institutions (1)

Pennsylvania State University¹

05 Jun 2020-arXiv: Learning

TL;DR: A neural network is built specifically structured like a PDA, where weights correspond directly to the PDA rules, and this model and method of proof can be generalized to other state machines, such as a Turing Machine.

...read moreread less

Abstract: Given a collection of strings belonging to a context free grammar (CFG) and another collection of strings not belonging to the CFG, how might one infer the grammar? This is the problem of grammatical inference. Since CFGs are the languages recognized by pushdown automata (PDA), it suffices to determine the state transition rules and stack action rules of the corresponding PDA. An approach would be to train a recurrent neural network (RNN) to classify the sample data and attempt to extract these PDA rules. But neural networks are not a priori aware of the structure of a PDA and would likely require many samples to infer this structure. Furthermore, extracting the PDA rules from the RNN is nontrivial. We build a RNN specifically structured like a PDA, where weights correspond directly to the PDA rules. This requires a stack architecture that is somehow differentiable (to enable gradient-based learning) and stable (an unstable stack will show deteriorating performance with longer strings). We propose a stack architecture that is differentiable and that provably exhibits orbital stability. Using this stack, we construct a neural network that provably approximates a PDA for strings of arbitrary length. Moreover, our model and method of proof can easily be generalized to other state machines, such as a Turing Machine.

...read moreread less

10 citations

Journal Article•DOI•

Synthesis of regular expression problems and solutions

[...]

Abejide Ade-Ibijola¹•Institutions (1)

University of Johannesburg¹

16 Nov 2020-International Journal of Computers and Applications

TL;DR: Formal languages and automata (FLA) theory is perceived by many as one of the hardest topics to teach or learn at the undergraduate level, due to the abstract nature of its contents; often containi...

...read moreread less

Abstract: Formal languages and automata (FLA) theory is perceived by many as one of the hardest topics to teach or learn at the undergraduate level, due to the abstract nature of its contents; often containi...

...read moreread less

9 citations

Journal Article•DOI•

Sanskrit to universal networking language EnConverter system based on deep learning and context-free grammar

[...]

Sitender¹, Seema Bawa¹•Institutions (1)

Thapar University¹

11 Oct 2020-Multimedia Systems

TL;DR: This work presents an extension of SANSUNL system by enhancing POS tagging, Sanskrit language processing and parsing, and proposed Sanskrit context-free grammar has been used with CYK parser to perform the parsing of the input sentence.

...read moreread less

Abstract: Machine Translation is a mechanism of transforming text from one language to another with the help of computer technology. Earlier in 2018, a machine translation system had been developed by the authors that translate Sanskrit text to Universal Networking Language expressions and was named as SANSUNL. The work presented in this paper is an extension of SANSUNL system by enhancing POS tagging, Sanskrit language processing and parsing. A Sanskrit stemmer having 23 prefixes and 774 suffixes with grammar rules are used for stemming the Sanskrit sentence in the proposed system. Bidirectional long short-term memory (Bi-LSTM) and stacked LSTM deep neural network models have been used for part of speech tagging of the input Sanskrit text. A tagged dataset of around 400 k entries for Sanskrit have been used for training and testing the neural network models. Proposed Sanskrit context-free grammar has been used with CYK parser to perform the parsing of the input sentence. Size of the Sanskrit-Universal Word dictionary has been increased from 15000 to 25000 entries. Approximately 1500 UNL generation rules have been used to resolve the 46 UNL relations. Four datasets UC-A1, UC-A2, Spanish server gold standard dataset, and 500 Sanskrit sentences taken from the general domain have been used for validating the system. The proposed system is evaluated on BLEU and Fluency score metrics and has reported an efficiency of 95.375%.

...read moreread less

8 citations

Journal Article•DOI•

A novel context-free grammar for the generation of PSO algorithms

[...]

Péricles B. C. de Miranda¹, Ricardo B. C. Prudêncio¹•Institutions (1)

Universidade Federal Rural de Pernambuco¹

01 Sep 2020-Natural Computing

TL;DR: A novel context-free grammar for Grammar-Guided Genetic Programming (GGGP) algorithms to guide the creation of Particle Swarm Optimizers and the experiments have shown that the algorithms generated from the grammar reached better results.

...read moreread less

Abstract: Particle swarm optimization algorithm (PSO) has been widely studied over the years due to its competitive results in different applications. However, its performance is dependent on some design components (e.g., inertia factor, velocity equation, topology). Thus, to define which is the best algorithm design to solve a given optimization problem is difficult due to the large number of variations and parameters that can be considered. This work proposes a novel context-free grammar for Grammar-Guided Genetic Programming (GGGP) algorithms to guide the creation of Particle Swarm Optimizers. The proposed grammar considers four aspects of the PSO algorithm that may strongly impact on its performance: swarm initialization, neighborhood topology, velocity update equation and mutation operator. To assess the proposal, a GGGP algorithm was set with the proposed grammar and employed to optimize the PSO algorithm in 32 unconstrained continuous optimization problems. In the experiments, we compared the algorithms generated from the proposed grammar with those algorithms produced by two other grammars presented in the literature to automate PSO designs. The results achieved by the proposed grammar were better than the counterparts. Besides, we also compared the generated algorithms to 6 competition algorithms with different strategies. The experiments have shown that the algorithms generated from the grammar reached better results.

...read moreread less

8 citations

Book Chapter•DOI•

Non-Self-Embedding Grammars, Constant-Height Pushdown Automata, and Limited Automata

[...]

Bruno Guillon¹, Giovanni Pighizzini¹, Luca Prigioniero¹•Institutions (1)

University of Milan¹

28 Dec 2020

TL;DR: It is proved that non-self-embedding grammars and constant-height pushdown automata are polynomially related in size and the converse transformation is proved to cost exponential.

...read moreread less

Abstract: Non-self-embedding grammars are a restriction of context-free grammars which does not allow to describe recursive structures and, hence, which characterizes only the class of regular languages. A double exponential gap in size from non-self-embedding grammars to deterministic finite automata is known. The same size gap is also known from constant-height pushdown automata and 1-limited automata to deterministic finite automata. Constant-height pushdown automata and 1-limited automata are compared with non-self-embedding grammars. It is proved that non-self-embedding grammars and constant-height pushdown automata are polynomially related in size. Furthermore, a polynomial size simulation by 1-limited automata is presented. However, the converse transformation is proved to cost exponential.

...read moreread less

7 citations

Proceedings Article•DOI•

Unsupervised Statistical Learning of Context-free Grammar

[...]

Olgierd Unold¹, Mateusz Gabor¹, Wojciech Wieczorek²•Institutions (2)

Wrocław University of Technology¹, University of Silesia in Katowice²

01 Jan 2020

7 citations

Posted Content•

Vector symbolic architectures for context-free grammars.

[...]

Peter beim Graben, Markus Huber, Werner Meyer, Ronald Römer, Constanze Tschöpe, Matthias Wolff - Show less +2 more

11 Mar 2020-arXiv: Computation and Language

TL;DR: This work presents a rigorous mathematical framework for the representation of phrase structure trees and parse trees of context-free grammars (CFG) in Fock space, i.e. infinite-dimensional Hilbert space as being used in quantum field theory.

...read moreread less

Abstract: Background / introduction. Vector symbolic architectures (VSA) are a viable approach for the hyperdimensional representation of symbolic data, such as documents, syntactic structures, or semantic frames. Methods. We present a rigorous mathematical framework for the representation of phrase structure trees and parse trees of context-free grammars (CFG) in Fock space, i.e. infinite-dimensional Hilbert space as being used in quantum field theory. We define a novel normal form for CFG by means of term algebras. Using a recently developed software toolbox, called FockBox, we construct Fock space representations for the trees built up by a CFG left-corner (LC) parser. Results. We prove a universal representation theorem for CFG term algebras in Fock space and illustrate our findings through a low-dimensional principal component projection of the LC parser states. Conclusions. Our approach could leverage the development of VSA for explainable artificial intelligence (XAI) by means of hyperdimensional deep neural computation. It could be of significance for the improvement of cognitive user interfaces and other applications of VSA in machine learning.

...read moreread less

6 citations

Posted Content•

The Return of Lexical Dependencies: Neural Lexicalized PCFGs.

[...]

Hao Zhu¹, Yonatan Bisk², Graham Neubig¹•Institutions (2)

Carnegie Mellon University¹, University of Washington²

29 Jul 2020-arXiv: Computation and Language

TL;DR: The authors use context free grammar (CFG) based methods for grammar induction with lexical dependencies to overcome sparsity problems and effectively induce both constituents and dependencies within a single model, which results in stronger results than achieved when modeling either formalism alone.

...read moreread less

Abstract: In this paper we demonstrate that $\textit{context free grammar (CFG) based methods for grammar induction benefit from modeling lexical dependencies}$. This contrasts to the most popular current methods for grammar induction, which focus on discovering $\textit{either}$ constituents $\textit{or}$ dependencies. Previous approaches to marry these two disparate syntactic formalisms (e.g. lexicalized PCFGs) have been plagued by sparsity, making them unsuitable for unsupervised grammar induction. However, in this work, we present novel neural models of lexicalized PCFGs which allow us to overcome sparsity problems and effectively induce both constituents and dependencies within a single model. Experiments demonstrate that this unified framework results in stronger results on both representations than achieved when modeling either formalism alone. Code is available at this https URL.

...read moreread less

Journal Article•DOI•

Specializing Context-Free Grammars With a (1 + 1)-EA

[...]

Luca Manzoni¹, Alberto Bartoli¹, Mauro Castelli², Ivo Gonçalves³, Eric Medvet¹ - Show less +1 more•Institutions (3)

University of Trieste¹, Universidade Nova de Lisboa², University of Coimbra³

26 Mar 2020-IEEE Transactions on Evolutionary Computation

TL;DR: The quality of a grammar for a problem is defined in terms of the average fitness of the candidate solutions generated using that grammar, and that three grammars of equal quality for a grammar-based version of the ONEMAX problem greatly vary in how they can be specialized with that (1 + 1)-EA.

...read moreread less

Abstract: Context-free grammars are useful tools for modeling the solution space of problems that can be solved by optimization algorithms. For a given solution space, there exists an infinite number of grammars defining that space, and there are clues that changing the grammar may impact the effectiveness of the optimization. In this article, we investigate theoretically and experimentally the possibility of specializing a grammar in a problem, that is, of systematically improving the quality of the grammar for the given problem. To this end, we define the quality of a grammar for a problem in terms of the average fitness of the candidate solutions generated using that grammar. Theoretically, we demonstrate the following findings: 1) that a simple mutation operator employed in a (1 + 1)-EA setting can be used to specialize a grammar in a problem without changing the solution space defined by the grammar and 2) that three grammars of equal quality for a grammar-based version of the ONEMAX problem greatly vary in how they can be specialized with that (1 + 1)-EA, as the expected time required to obtain the same improvement in quality can vary exponentially among grammars. Then, experimentally, we validate the theoretical findings and extend them to other problems, grammars, and a more general version of the mutation operator.

...read moreread less

Book Chapter•DOI•

Context-Free Path Querying by Kronecker Product

[...]

Egor Orachev¹, Ilya Epelbaum¹, Rustam Azimov¹, Semyon V. Grigorev¹•Institutions (1)

Saint Petersburg State University¹

25 Aug 2020

TL;DR: This paper provides a new CFPQ algorithm which is based on such linear algebra operations as Kronecker product and transitive closure and handles grammars presented as recursive state machines and avoids grammar growth which provides the possibility for queries optimization.

...read moreread less

Abstract: Context-free path queries (CFPQ) extend the regular path queries (RPQ) by allowing context-free grammars to be used as constraints for paths. Algorithms for CFPQ are actively developed, but J. Kuijpers et al. have recently concluded, that existing algorithms are not performant enough to be used in real-world applications. Thus the development of new algorithms for CFPQ is justified. In this paper, we provide a new CFPQ algorithm which is based on such linear algebra operations as Kronecker product and transitive closure and handles grammars presented as recursive state machines. Thus, the proposed algorithm can be implemented by using high-performance libraries and modern parallel hardware. Moreover, it avoids grammar growth which provides the possibility for queries optimization.

...read moreread less

Journal Article•DOI•

Rules for Orthographic Word Parsing of the Philippines’ Cebuano-Visayan Language Using Context-Free Grammars

[...]

Roseclaremath A. Caroro¹, Rolysent K. Paredes¹, Jerry M. Lumasag¹•Institutions (1)

Misamis University¹

01 Apr 2020-International Journal of Software Science and Computational Intelligence

TL;DR: G grammar rules for hyphenated words are created which include sequences of a hyphen between vowel-consonant, consonant-cons onant, vowel-vowel, and consonants to enhance the understanding and comprehension of the Cebuano-Visayan discourse.

...read moreread less

Abstract: Syllabication is essential in the preprocessing stage of speech systems. In the context of the Philippines' Cebuano-Visayan language's syllabication rules, the existing rules do not include hyphenated words although the hyphen defines the syllable boundary in a word. Hence, this study created grammar rules for hyphenated words which include sequences of a hyphen between vowel-consonant, consonant-consonant, vowel-vowel, and consonant-vowel. The test was done for the enhanced grammar rules for Cebuano-Visayan syllabication with 1,465 representative hyphenated and non-hyphenated words of varying lengths. The result further implies that the syllabication analysis for hyphenated words showed that hyphens improve the naturalness and intelligibility in the utterance of the words, thereby enhancing the understanding and comprehension of the Cebuano-Visayan discourse.

...read moreread less

Proceedings Article•DOI•

A methodology for creating question answering corpora using inverse data annotation

[...]

Jan Milan Deriu¹, Katsiaryna Mlynchyk, Philippe Schläpfer, Álvaro Rodrigo², Dirk von Grünigen¹, Nicolas Kaiser, Kurt Stockinger³, Eneko Agirre⁴, Mark Cieliebak¹ - Show less +5 more•Institutions (4)

Zurich University of Applied Sciences/ZHAW¹, National University of Distance Education², Zürcher Fachhochschule³, University of the Basque Country⁴

01 Jul 2020

TL;DR: In this article, an intermediate representation that is based on the logical query plan in a database, called Operation Trees (OT), is introduced to invert the annotation process without loosing flexibility in the types of queries that we generate.

...read moreread less

Abstract: In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database, called Operation Trees (OT). This representation allows us to invert the annotation process without loosing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of the tokens to the operations. Thus, we randomly generate OTs from a context free grammar and annotators just have to write the appropriate question and assign the tokens. We compare our corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases, to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our dataset is a challenging dataset and that the token alignment can be leveraged to significantly increase the performance.

...read moreread less

Proceedings Article•DOI•

Indonesian Parsing using Probabilistic Context-Free Grammar (PCFG) and Viterbi-Cocke Younger Kasami (Viterbi-CYK)

[...]

Denis Eka Cahyani¹, Langlang Gumilar², Ajie Pangestu¹•Institutions (2)

Sebelas Maret University¹, State University of Malang²

10 Dec 2020

TL;DR: This paper used Probabilistic Context-Free Grammar (PCFG) and Viterbi-Cocke Younger Kasami (ViterbiCYK) methods to solve the ambiguity problem of Indonesian sentence pattern parsing.

...read moreread less

Abstract: Parsing is a tool for understanding natural grammar patterns. The problem of structural ambiguity in identifying sentence patterns often occurs in parsing. Syntactic parsing is one approach to solving structural ambiguity problems using the Probabilistic Context-Free Grammar (PCFG) and Viterbi-Cocke Younger Kasami (Viterbi-CYK) methods. Meanwhile, a large number of Indonesian language resources are needed as machine knowledge to parse. This research build a parsing of Indonesian sentence patterns with Indonesian Tagged corpus resource then solve the ambiguity problem of Indonesian sentence pattern parsing using PCFG and Viterbi-CYK algorithms. The corpus data is processed to obtain grammar rules using the PCFG algorithm. Then, the sentence on the corpus is processed by the PCFG rule that generated and uses the Viterbi-CYK algorithm to get the parse tree taken based on the highest probability value. The results of the research produced an average value of similarity production rules which the highest values is 92.95%. This shows that the Indonesian parsing successfully parses Indonesian sentence and can solve the problem of structural ambiguity in the parsing of Indonesian sentence patterns.

...read moreread less

Proceedings Article•DOI•

A Novel Approach to Interactive Dialogue Generation Based on Natural Language Creation with Context-Free Grammars and Sentiment Analysis

[...]

Fabrizio Palmas¹, Jakob Raith¹, udrun Klinker¹•Institutions (1)

Technische Universität München¹

06 Jul 2020

TL;DR: This paper examines how generated narrative text is perceived by players in terms of meaningfulness, immersion, and flow and furthermore that the presented novel approach can be a valid method to implement computer-generated dialogues.

...read moreread less

Abstract: The demand for high-quality video games is ever increasing and the ambition to tell truly interactive and dynamic stories is a significant factor contributing to this trend. This paper examines how generated narrative text is perceived by players in terms of meaningfulness, immersion, and flow and furthermore that the presented novel approach can be a valid method to implement computer-generated dialogues. For this approach, generative grammars are used to create written dialogue within a conversation for a learning application. With the support of sentiment analysis, the generated text is analysed with a focus on its semantics. Suitable text lines based on the current game state are provided by a dialogue system. Principles of gamification are used to create a learning application that renders such a generated dialogue scenario playable. To test this hypothesis, a user study examines the capabilities of the process by having players assess factors, which are imperative for a narrative game. The learning application shows strong potential in terms of text variation and dialogue that is easy to follow.

...read moreread less

Posted Content•

On the complexity of the universality and inclusion problems for unambiguous context-free grammars (technical report).

[...]

Lorenzo Clemente

09 Jun 2020-arXiv: Formal Languages and Automata Theory

TL;DR: In this article, the authors studied the computational complexity of universality and inclusion problems for unambiguous finite automata and context-free grammars, and showed that several such problems can be reduced to the universality problem for contextsafe languages.

...read moreread less

Abstract: We study the computational complexity of universality and inclusion problems for unambiguous finite automata and context-free grammars. We observe that several such problems can be reduced to the universality problem for unambiguous context-free grammars. The latter problem has long been known to be decidable and we propose a PSPACE algorithm that works by reduction to the zeroness problem of recurrence equations with convolution. We are not aware of any non-trivial complexity lower bounds. However, we show that computing the coin-flip measure of an unambiguous context-free language, a quantitative generalisation of universality, is hard for the long-standing open problem SQRTSUM.

...read moreread less

Posted Content•

Learning of Structurally Unambiguous Probabilistic Grammars

[...]

Dolav Nitay¹, Dana Fisman¹, Michal Ziv-Ukelson¹•Institutions (1)

Ben-Gurion University of the Negev¹

15 Nov 2020-arXiv: Formal Languages and Automata Theory

TL;DR: It is shown that the learned CMTA can be converted into a probabilistic grammar, thus providing a complete algorithm for learning a structurally unambiguous probabilism context free grammar using structured membership queries and structured equivalence queries.

...read moreread less

Abstract: The problem of identifying a probabilistic context free grammar has two aspects: the first is determining the grammar's topology (the rules of the grammar) and the second is estimating probabilistic weights for each rule. Given the hardness results for learning context-free grammars in general, and probabilistic grammars in particular, most of the literature has concentrated on the second problem. In this work we address the first problem. We restrict attention to structurally unambiguous weighted context-free grammars (SUWCFG) and provide a query learning algorithm for structurally unambiguous probabilistic context-free grammars (SUPCFG). We show that SUWCFG can be represented using co-linear multiplicity tree automata (CMTA), and provide a polynomial learning algorithm that learns CMTAs. We show that the learned CMTA can be converted into a probabilistic grammar, thus providing a complete algorithm for learning a structurally unambiguous probabilistic context free grammar (both the grammar topology and the probabilistic weights) using structured membership queries and structured equivalence queries. We demonstrate the usefulness of our algorithm in learning PCFGs over genomic data.

...read moreread less

Proceedings Article•DOI•

Test case generation from context-free grammars using generalized traversal of LR-automata

[...]

Christoff Rossouw¹, Bernd Fischer¹•Institutions (1)

Stellenbosch University¹

16 Nov 2020

TL;DR: A new algorithm is developed that generates positive test cases by covering all edges between pairs of directly connected states in a two-phase breadth-first path search, and extends this algorithm to generate negative test cases, by applying different edge mutation operations during the extraction of test cases from paths.

...read moreread less

Abstract: Test case generation from context-free grammars typically uses the grammar's production rules to directly construct words that cover specific sets of derivations. Here, we investigate test case generation by traversing graphs derived from the LR-automata corresponding to the grammars. We develop a new algorithm that generates positive test cases by covering all edges between pairs of directly connected states in a two-phase breadth-first path search. The algorithm iterates over all edges stemming from shift/reduce and reduce/reduce conflicts, using a technique similar to the stack duplication used in GLR parsing. We then extend our algorithm to generate negative (i.e., syntactically invalid) test cases, by applying different edge mutation operations during the extraction of test cases from paths.

...read moreread less

Journal Article•DOI•

On the Complexity of the Universality and Inclusion Problems for Unambiguous Context-Free Grammars

[...]

Lorenzo Clemente¹•Institutions (1)

University of Warsaw¹

07 Aug 2020-arXiv: Formal Languages and Automata Theory

TL;DR: It is shown that computing the coin-flip measure of an unambiguous context-free language, a quantitative generalisation of universality, is hard for the long-standing open problem SQRTSUM.

...read moreread less

Posted Content•

Grammar compression with probabilistic context-free grammar

[...]

Hiroaki Naganuma¹, Diptarama Hendrian¹, Ryo Yoshinaka¹, Ayumi Shinohara¹, Naoki Kobayashi² - Show less +1 more•Institutions (2)

Tohoku University¹, University of Tokyo²

18 Mar 2020-arXiv: Data Structures and Algorithms

TL;DR: A probabilistic grammar G that generates T, but not necessarily as a unique element of L(G) = T, is considered, which is more efficient than SLPs for certain texts, both from theoretical and practical points of view.

...read moreread less

Abstract: We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string $T$ has been compressed as a context-free grammar $G$ in Chomsky normal form satisfying $L(G) = \{T\}$. Such a grammar is often called a \emph{straight-line program} (SLP). In this paper, we consider a probabilistic grammar $G$ that generates $T$, but not necessarily as a unique element of $L(G)$. In order to recover the original text $T$ unambiguously, we keep both the grammar $G$ and the derivation tree of $T$ from the start symbol in $G$, in compressed form. We show some simple evidence that our proposal is indeed more efficient than SLPs for certain texts, both from theoretical and practical points of view.

...read moreread less

Proceedings Article•DOI•

Parsing for Natural Language in Odia: A Novel Study

[...]

Bishwa Ranjan Das¹, Dilip Singh¹, Prakash Chandra Bhoi¹, Pusyanki Priyadarshini¹•Institutions (1)

Siksha O Anusandhan University¹

01 Mar 2020

TL;DR: A simple parsing for Odia Language using Context-Free Grammar (CFG) is shown and all the things are being represented as simple tree primarily formalism in context free grammar.

...read moreread less

Abstract: In this paper a simple parsing for Odia Language using Context-Free Grammar (CFG) is shown. Parsing is a technique used for building a sentence automatically in the phrases of grammar as well as in lexicon using syntactic analysis. It also includes semantic analysis and syntactic analysis that basically attention on parsing for Odia Language based on Panini method, Context Free Grammar along with top down approach. All the things are being represented as simple tree primarily formalism in context free grammar. Noun Phrase Chunking, Tokenization, Morphological Analysis and Part of Speech Tagging are being described briefly for Odia Language.

...read moreread less

Journal Article•DOI•

The Use of Context-Free Probabilistic Grammar to Anonymise Statistical Data

[...]

Zygmunt Mazur¹, Janusz Pec²•Institutions (2)

Wrocław University of Technology¹, Central Statistical Organisation²

10 Jan 2020-Cybernetics and Systems

TL;DR: The combination of the proposed method with asymmetric encryption of the definition of context-free grammar using public key infrastructure, makes it probable that its resistance to attacks will be quite high, because statistical methods that are used in the analysis of natural languages are not susceptible to attacks.

...read moreread less

Abstract: In the following article, a proprietary method of anonymisation of identifiable statistical data using context-free probabilistic grammar is proposed. The advantage of this method is that it is sim...

...read moreread less

Proceedings Article•DOI•

An Algorithm for Context-Free Path Queries over Graph Databases

[...]

Ciro M. Medeiros¹, Martin A. Musicante¹, Umberto Souza da Costa¹•Institutions (1)

Federal University of Rio Grande do Norte¹

19 Oct 2020

TL;DR: In this article, the authors present an algorithm for context-free path query processing, which works by looking for localized paths, allowing them to process subgraphs, in contrast to other approaches that have to process the whole graph.

...read moreread less

Abstract: Path queries are used to specify paths inside a data graph to match a given pattern. Query languages such as SPARQL usually include support for regular path patterns defined by means of regular expressions. Context-free path queries define a path whose language can be defined by a context-free grammar. This kind of query is interesting in practice in domains such as genetics, data science, or source code analysis. In this paper, we present a novel algorithm for context-free path query processing. Our algorithm works by looking for localized paths, allowing us to process subgraphs, in contrast to other approaches that have to process the whole graph. It also takes any context-free grammar as input, avoiding the use of normal forms that are more problematic in practice. The output of our algorithm provides enough information to reconstruct the paths matching the query. We prove the correctness of our approach and show its runtime and memory complexity. We show the viability of our approach by means of a prototype implemented in Go. We run experiments proposed in recent works, which include both synthetic and real RDF databases. Our algorithm shows some performance gains when compared with other algorithms implemented using single-thread programs.

...read moreread less

Journal Article•DOI•

Generalized Register Context-Free Grammars

[...]

Ryoma Senda¹, Yoshiaki Takata², Hiroyuki Seki¹•Institutions (2)

Nagoya University¹, Kochi University of Technology²

01 Mar 2020-IEICE Transactions on Information and Systems

TL;DR: In this paper, a generalized register context-free grammars (GRCFG) is defined by permitting an arbitrary relation on data values in the guard expression of a production rule.

...read moreread less

Abstract: Register context-free grammars (RCFG) is an extension of context-free grammars to handle data values in a restricted way. This paper first introduces register type as a finite representation of the register contents and shows some properties of RCFG. Next, generalized RCFG (GRCFG) is defined by permitting an arbitrary relation on data values in the guard expression of a production rule. We extend register type to GRCFG and introduce two properties of GRCFG, the simulation property and the type oracle. We then show that $\varepsilon $-rule removal is possible and the emptiness and membership problems are EXPTIME solvable for GRCFG that satisfy these two properties.

...read moreread less

A Hybrid Approach to Procedural Dungeon Generation

[...]

Mathias Paul Babin

01 Jan 2020

TL;DR: This thesis presents a novel approach to the Procedural Content Generation (PCG) of both maze and dungeon environments by decomposing the problem of level generation into a series of stages which begins with the production of macro-level functional structures and ends with micro-level aesthetic details.

...read moreread less

Abstract: This thesis presents a novel approach to the Procedural Content Generation (PCG) of both maze and dungeon environments. The solution we propose in this thesis borrows techniques from both Procedural Content Generation via Machine Learning as well as Constructive PCG methods. The approach we take involves decomposing the problem of level generation into a series of stages which begins with the production of macro-level functional structures and ends with micro-level aesthetic details; specifically, we train a Deep Convolutional Neural Network to produce high-quality mazes, which in turn, are transformed into the rooms of larger dungeon levels using a constructive algorithm. For our dungeon’s micro-level details, we use a contextfree grammar for the instantiation of interactable puzzle elements, and an n-gram model for decorating our dungeon’s entrance rooms. This unique combination of methods successfully generates a large number of visually impressive game levels without compromising on any desirable PCG metrics such as speed, reliability, controllability, expressivity, or believability.

...read moreread less

Proceedings Article•DOI•

Practical Repetition-Aware Grammar Compression

[...]

Isamu Furuya¹•Institutions (1)

Hokkaido University¹

25 Mar 2020

TL;DR: The authors proposed a compact encoding method for MR-RePair and showed its effectiveness through comparative experiments and extended it to run-length context free grammar and designed a novel variant for it called RL-MR-Re-Pair.

...read moreread less

Abstract: The goal of grammar compression is to construct a small sized context free grammar which uniquely generates the input text data. Among grammar compression methods, RePair is known for its good practical compression performance. MR-RePair was recently proposed as an improvement to RePair for constructing small-sized context free grammar for repetitive text data. However, a compact encoding scheme has not been discussed for MR-RePair. We propose a practical encoding method for MR-RePair and show its effectiveness through comparative experiments. Moreover, we extend MR-RePair to run-length context free grammar and design a novel variant for it called RL-MR-RePair. We experimentally demonstrate that a compression scheme consisting of RL-MR-RePair and the proposed encoding method show good performance on real repetitive datasets.

...read moreread less

Authorship Verification with Prediction by Partial Matching and Context-free Grammar.

[...]

Lukasz Gagala

01 Jan 2020

Proceedings Article•

Recognizing Sentence-level Logical Document Structures with the Help of Context-free Grammars.

[...]

Jonathan Hildebrand, Wahed Hemati¹, Alexander Mehler¹•Institutions (1)

Goethe University Frankfurt¹

01 May 2020

TL;DR: A tool that segments sentences into tree structures to detect this type of recursive structure is introduced and it is shown that for certain sentence categories, which can be determined automatically, improvements in German dependency parsing can be achieved using the segmenter for preprocessing.

...read moreread less

Abstract: Current sentence boundary detectors split documents into sequentially ordered sentences by detecting their beginnings and ends. Sentences, however, are more deeply structured even on this side of constituent and dependency structure: they can consist of a main sentence and several subordinate clauses as well as further segments (e.g. inserts in parentheses); they can even recursively embed whole sentences and then contain multiple sentence beginnings and ends. In this paper, we introduce a tool that segments sentences into tree structures to detect this type of recursive structure. To this end, we retrain different constituency parsers with the help of modified training data to transform them into sentence segmenters. With these segmenters, documents are mapped to sequences of sentence-related “logical document structures”. The resulting segmenters aim to improve downstream tasks by providing additional structural information. In this context, we experiment with German dependency parsing. We show that for certain sentence categories, which can be determined automatically, improvements in German dependency parsing can be achieved using our segmenter for preprocessing. The assumption suggests that improvements in other languages and tasks can be achieved.

...read moreread less