scispace - formally typeset
Search or ask a question

Showing papers on "Context-free grammar published in 2020"


Proceedings ArticleDOI
08 Nov 2020
TL;DR: A general algorithm that takes a program and a small set of sample inputs and automatically infers a readable context-free grammar capturing the input language of the program, and works entirely without program specific heuristics.
Abstract: One of the key properties of a program is its input specification. Having a formal input specification can be critical in fields such as vulnerability analysis, reverse engineering, software testing, clone detection, or refactoring. Unfortunately, accurate input specifications for typical programs are often unavailable or out of date. In this paper, we present a general algorithm that takes a program and a small set of sample inputs and automatically infers a readable context-free grammar capturing the input language of the program. We infer the syntactic input structure only by observing access of input characters at different locations of the input parser. This works on all stack based recursive descent input parsers, including parser combinators, and works entirely without program specific heuristics. Our Mimid prototype produced accurate and readable grammars for a variety of evaluation subjects, including complex languages such as JSON, TinyC, and JavaScript.

33 citations


Journal ArticleDOI
TL;DR: Novel neural models of lexicalized PCFGs are presented that allow us to overcome sparsity problems and effectively induce both constituents and dependencies within a single model and results in stronger results on both representations than achieved when modeling either formalism alone.
Abstract: In this paper we demonstrate that context free grammar (CFG) based methods for grammar induction benefit from modeling lexical dependencies . This contrasts to the most popular current methods for grammar induction, which focus on discovering either constituents or dependencies. Previous approaches to marry these two disparate syntactic formalisms (e.g. lexicalized PCFGs) have been plagued by sparsity, making them unsuitable for unsupervised grammar induction. However, in this work, we present novel neural models of lexicalized PCFGs which allow us to overcome sparsity problems and effectively induce both constituents and dependencies within a single model. Experiments demonstrate that this unified framework results in stronger results on both representations than achieved when modeling either formalism alone. Code is available at https://github.com/neulab/neural-lpcfg .

29 citations


Proceedings ArticleDOI
16 Nov 2020
TL;DR: Kogi as mentioned in this paper is a tool for deriving block-based environments from context-free grammars, which can be used to define abstract structures for describing blockbased environments.
Abstract: Block-based programming systems employ a jigsaw metaphor to write programs. They are popular in the domain of programming education (e.g., Scratch), but also used as a programming interface for end-users in other disciplines, such as arts, robotics, and configuration management. In particular, block-based environments promise a convenient interface for Domain-Specific Languages (DSLs) for domain experts who might lack a traditional programming education. However, building a block-based environment for a DSL from scratch requires significant effort. This paper presents an approach to engineer block-based language interfaces by reusing existing language artifacts. We present Kogi, a tool for deriving block-based environments from context-free grammars. We identify and define the abstract structure for describing block-based environments. Kogi transforms a context-free grammar into this structure, which then generates a block-based environment based on Google Blockly. The approach is illustrated with four case studies, a DSL for state machines, Sonification Blocks (a DSL for sound synthesis), Pico (a simple programming language), and QL (a DSL for questionnaires). The results show that usable block-based environments can be derived from context-free grammars, and with an order of magnitude reduction in effort.

13 citations


Posted Content
TL;DR: A neural network is built specifically structured like a PDA, where weights correspond directly to the PDA rules, and this model and method of proof can be generalized to other state machines, such as a Turing Machine.
Abstract: Given a collection of strings belonging to a context free grammar (CFG) and another collection of strings not belonging to the CFG, how might one infer the grammar? This is the problem of grammatical inference. Since CFGs are the languages recognized by pushdown automata (PDA), it suffices to determine the state transition rules and stack action rules of the corresponding PDA. An approach would be to train a recurrent neural network (RNN) to classify the sample data and attempt to extract these PDA rules. But neural networks are not a priori aware of the structure of a PDA and would likely require many samples to infer this structure. Furthermore, extracting the PDA rules from the RNN is nontrivial. We build a RNN specifically structured like a PDA, where weights correspond directly to the PDA rules. This requires a stack architecture that is somehow differentiable (to enable gradient-based learning) and stable (an unstable stack will show deteriorating performance with longer strings). We propose a stack architecture that is differentiable and that provably exhibits orbital stability. Using this stack, we construct a neural network that provably approximates a PDA for strings of arbitrary length. Moreover, our model and method of proof can easily be generalized to other state machines, such as a Turing Machine.

10 citations


Journal ArticleDOI
TL;DR: Formal languages and automata (FLA) theory is perceived by many as one of the hardest topics to teach or learn at the undergraduate level, due to the abstract nature of its contents; often containi...
Abstract: Formal languages and automata (FLA) theory is perceived by many as one of the hardest topics to teach or learn at the undergraduate level, due to the abstract nature of its contents; often containi...

9 citations


Journal ArticleDOI
Sitender1, Seema Bawa1
TL;DR: This work presents an extension of SANSUNL system by enhancing POS tagging, Sanskrit language processing and parsing, and proposed Sanskrit context-free grammar has been used with CYK parser to perform the parsing of the input sentence.
Abstract: Machine Translation is a mechanism of transforming text from one language to another with the help of computer technology. Earlier in 2018, a machine translation system had been developed by the authors that translate Sanskrit text to Universal Networking Language expressions and was named as SANSUNL. The work presented in this paper is an extension of SANSUNL system by enhancing POS tagging, Sanskrit language processing and parsing. A Sanskrit stemmer having 23 prefixes and 774 suffixes with grammar rules are used for stemming the Sanskrit sentence in the proposed system. Bidirectional long short-term memory (Bi-LSTM) and stacked LSTM deep neural network models have been used for part of speech tagging of the input Sanskrit text. A tagged dataset of around 400 k entries for Sanskrit have been used for training and testing the neural network models. Proposed Sanskrit context-free grammar has been used with CYK parser to perform the parsing of the input sentence. Size of the Sanskrit-Universal Word dictionary has been increased from 15000 to 25000 entries. Approximately 1500 UNL generation rules have been used to resolve the 46 UNL relations. Four datasets UC-A1, UC-A2, Spanish server gold standard dataset, and 500 Sanskrit sentences taken from the general domain have been used for validating the system. The proposed system is evaluated on BLEU and Fluency score metrics and has reported an efficiency of 95.375%.

8 citations


Journal ArticleDOI
TL;DR: A novel context-free grammar for Grammar-Guided Genetic Programming (GGGP) algorithms to guide the creation of Particle Swarm Optimizers and the experiments have shown that the algorithms generated from the grammar reached better results.
Abstract: Particle swarm optimization algorithm (PSO) has been widely studied over the years due to its competitive results in different applications. However, its performance is dependent on some design components (e.g., inertia factor, velocity equation, topology). Thus, to define which is the best algorithm design to solve a given optimization problem is difficult due to the large number of variations and parameters that can be considered. This work proposes a novel context-free grammar for Grammar-Guided Genetic Programming (GGGP) algorithms to guide the creation of Particle Swarm Optimizers. The proposed grammar considers four aspects of the PSO algorithm that may strongly impact on its performance: swarm initialization, neighborhood topology, velocity update equation and mutation operator. To assess the proposal, a GGGP algorithm was set with the proposed grammar and employed to optimize the PSO algorithm in 32 unconstrained continuous optimization problems. In the experiments, we compared the algorithms generated from the proposed grammar with those algorithms produced by two other grammars presented in the literature to automate PSO designs. The results achieved by the proposed grammar were better than the counterparts. Besides, we also compared the generated algorithms to 6 competition algorithms with different strategies. The experiments have shown that the algorithms generated from the grammar reached better results.

8 citations


Book ChapterDOI
28 Dec 2020
TL;DR: It is proved that non-self-embedding grammars and constant-height pushdown automata are polynomially related in size and the converse transformation is proved to cost exponential.
Abstract: Non-self-embedding grammars are a restriction of context-free grammars which does not allow to describe recursive structures and, hence, which characterizes only the class of regular languages. A double exponential gap in size from non-self-embedding grammars to deterministic finite automata is known. The same size gap is also known from constant-height pushdown automata and 1-limited automata to deterministic finite automata. Constant-height pushdown automata and 1-limited automata are compared with non-self-embedding grammars. It is proved that non-self-embedding grammars and constant-height pushdown automata are polynomially related in size. Furthermore, a polynomial size simulation by 1-limited automata is presented. However, the converse transformation is proved to cost exponential.

7 citations



Posted Content
TL;DR: This work presents a rigorous mathematical framework for the representation of phrase structure trees and parse trees of context-free grammars (CFG) in Fock space, i.e. infinite-dimensional Hilbert space as being used in quantum field theory.
Abstract: Background / introduction. Vector symbolic architectures (VSA) are a viable approach for the hyperdimensional representation of symbolic data, such as documents, syntactic structures, or semantic frames. Methods. We present a rigorous mathematical framework for the representation of phrase structure trees and parse trees of context-free grammars (CFG) in Fock space, i.e. infinite-dimensional Hilbert space as being used in quantum field theory. We define a novel normal form for CFG by means of term algebras. Using a recently developed software toolbox, called FockBox, we construct Fock space representations for the trees built up by a CFG left-corner (LC) parser. Results. We prove a universal representation theorem for CFG term algebras in Fock space and illustrate our findings through a low-dimensional principal component projection of the LC parser states. Conclusions. Our approach could leverage the development of VSA for explainable artificial intelligence (XAI) by means of hyperdimensional deep neural computation. It could be of significance for the improvement of cognitive user interfaces and other applications of VSA in machine learning.

6 citations


Posted Content
TL;DR: The authors use context free grammar (CFG) based methods for grammar induction with lexical dependencies to overcome sparsity problems and effectively induce both constituents and dependencies within a single model, which results in stronger results than achieved when modeling either formalism alone.
Abstract: In this paper we demonstrate that $\textit{context free grammar (CFG) based methods for grammar induction benefit from modeling lexical dependencies}$. This contrasts to the most popular current methods for grammar induction, which focus on discovering $\textit{either}$ constituents $\textit{or}$ dependencies. Previous approaches to marry these two disparate syntactic formalisms (e.g. lexicalized PCFGs) have been plagued by sparsity, making them unsuitable for unsupervised grammar induction. However, in this work, we present novel neural models of lexicalized PCFGs which allow us to overcome sparsity problems and effectively induce both constituents and dependencies within a single model. Experiments demonstrate that this unified framework results in stronger results on both representations than achieved when modeling either formalism alone. Code is available at this https URL.

Journal ArticleDOI
TL;DR: The quality of a grammar for a problem is defined in terms of the average fitness of the candidate solutions generated using that grammar, and that three grammars of equal quality for a grammar-based version of the ONEMAX problem greatly vary in how they can be specialized with that (1 + 1)-EA.
Abstract: Context-free grammars are useful tools for modeling the solution space of problems that can be solved by optimization algorithms. For a given solution space, there exists an infinite number of grammars defining that space, and there are clues that changing the grammar may impact the effectiveness of the optimization. In this article, we investigate theoretically and experimentally the possibility of specializing a grammar in a problem, that is, of systematically improving the quality of the grammar for the given problem. To this end, we define the quality of a grammar for a problem in terms of the average fitness of the candidate solutions generated using that grammar. Theoretically, we demonstrate the following findings: 1) that a simple mutation operator employed in a (1 + 1)-EA setting can be used to specialize a grammar in a problem without changing the solution space defined by the grammar and 2) that three grammars of equal quality for a grammar-based version of the ONEMAX problem greatly vary in how they can be specialized with that (1 + 1)-EA, as the expected time required to obtain the same improvement in quality can vary exponentially among grammars. Then, experimentally, we validate the theoretical findings and extend them to other problems, grammars, and a more general version of the mutation operator.

Book ChapterDOI
25 Aug 2020
TL;DR: This paper provides a new CFPQ algorithm which is based on such linear algebra operations as Kronecker product and transitive closure and handles grammars presented as recursive state machines and avoids grammar growth which provides the possibility for queries optimization.
Abstract: Context-free path queries (CFPQ) extend the regular path queries (RPQ) by allowing context-free grammars to be used as constraints for paths. Algorithms for CFPQ are actively developed, but J. Kuijpers et al. have recently concluded, that existing algorithms are not performant enough to be used in real-world applications. Thus the development of new algorithms for CFPQ is justified. In this paper, we provide a new CFPQ algorithm which is based on such linear algebra operations as Kronecker product and transitive closure and handles grammars presented as recursive state machines. Thus, the proposed algorithm can be implemented by using high-performance libraries and modern parallel hardware. Moreover, it avoids grammar growth which provides the possibility for queries optimization.

Journal ArticleDOI
TL;DR: G grammar rules for hyphenated words are created which include sequences of a hyphen between vowel-consonant, consonant-cons onant, vowel-vowel, and consonants to enhance the understanding and comprehension of the Cebuano-Visayan discourse.
Abstract: Syllabication is essential in the preprocessing stage of speech systems. In the context of the Philippines' Cebuano-Visayan language's syllabication rules, the existing rules do not include hyphenated words although the hyphen defines the syllable boundary in a word. Hence, this study created grammar rules for hyphenated words which include sequences of a hyphen between vowel-consonant, consonant-consonant, vowel-vowel, and consonant-vowel. The test was done for the enhanced grammar rules for Cebuano-Visayan syllabication with 1,465 representative hyphenated and non-hyphenated words of varying lengths. The result further implies that the syllabication analysis for hyphenated words showed that hyphens improve the naturalness and intelligibility in the utterance of the words, thereby enhancing the understanding and comprehension of the Cebuano-Visayan discourse.

Proceedings ArticleDOI
01 Jul 2020
TL;DR: In this article, an intermediate representation that is based on the logical query plan in a database, called Operation Trees (OT), is introduced to invert the annotation process without loosing flexibility in the types of queries that we generate.
Abstract: In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database, called Operation Trees (OT). This representation allows us to invert the annotation process without loosing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of the tokens to the operations. Thus, we randomly generate OTs from a context free grammar and annotators just have to write the appropriate question and assign the tokens. We compare our corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases, to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our dataset is a challenging dataset and that the token alignment can be leveraged to significantly increase the performance.

Proceedings ArticleDOI
10 Dec 2020
TL;DR: This paper used Probabilistic Context-Free Grammar (PCFG) and Viterbi-Cocke Younger Kasami (ViterbiCYK) methods to solve the ambiguity problem of Indonesian sentence pattern parsing.
Abstract: Parsing is a tool for understanding natural grammar patterns. The problem of structural ambiguity in identifying sentence patterns often occurs in parsing. Syntactic parsing is one approach to solving structural ambiguity problems using the Probabilistic Context-Free Grammar (PCFG) and Viterbi-Cocke Younger Kasami (Viterbi-CYK) methods. Meanwhile, a large number of Indonesian language resources are needed as machine knowledge to parse. This research build a parsing of Indonesian sentence patterns with Indonesian Tagged corpus resource then solve the ambiguity problem of Indonesian sentence pattern parsing using PCFG and Viterbi-CYK algorithms. The corpus data is processed to obtain grammar rules using the PCFG algorithm. Then, the sentence on the corpus is processed by the PCFG rule that generated and uses the Viterbi-CYK algorithm to get the parse tree taken based on the highest probability value. The results of the research produced an average value of similarity production rules which the highest values is 92.95%. This shows that the Indonesian parsing successfully parses Indonesian sentence and can solve the problem of structural ambiguity in the parsing of Indonesian sentence patterns.

Proceedings ArticleDOI
06 Jul 2020
TL;DR: This paper examines how generated narrative text is perceived by players in terms of meaningfulness, immersion, and flow and furthermore that the presented novel approach can be a valid method to implement computer-generated dialogues.
Abstract: The demand for high-quality video games is ever increasing and the ambition to tell truly interactive and dynamic stories is a significant factor contributing to this trend. This paper examines how generated narrative text is perceived by players in terms of meaningfulness, immersion, and flow and furthermore that the presented novel approach can be a valid method to implement computer-generated dialogues. For this approach, generative grammars are used to create written dialogue within a conversation for a learning application. With the support of sentiment analysis, the generated text is analysed with a focus on its semantics. Suitable text lines based on the current game state are provided by a dialogue system. Principles of gamification are used to create a learning application that renders such a generated dialogue scenario playable. To test this hypothesis, a user study examines the capabilities of the process by having players assess factors, which are imperative for a narrative game. The learning application shows strong potential in terms of text variation and dialogue that is easy to follow.

Posted Content
TL;DR: In this article, the authors studied the computational complexity of universality and inclusion problems for unambiguous finite automata and context-free grammars, and showed that several such problems can be reduced to the universality problem for contextsafe languages.
Abstract: We study the computational complexity of universality and inclusion problems for unambiguous finite automata and context-free grammars. We observe that several such problems can be reduced to the universality problem for unambiguous context-free grammars. The latter problem has long been known to be decidable and we propose a PSPACE algorithm that works by reduction to the zeroness problem of recurrence equations with convolution. We are not aware of any non-trivial complexity lower bounds. However, we show that computing the coin-flip measure of an unambiguous context-free language, a quantitative generalisation of universality, is hard for the long-standing open problem SQRTSUM.

Posted Content
TL;DR: It is shown that the learned CMTA can be converted into a probabilistic grammar, thus providing a complete algorithm for learning a structurally unambiguous probabilism context free grammar using structured membership queries and structured equivalence queries.
Abstract: The problem of identifying a probabilistic context free grammar has two aspects: the first is determining the grammar's topology (the rules of the grammar) and the second is estimating probabilistic weights for each rule. Given the hardness results for learning context-free grammars in general, and probabilistic grammars in particular, most of the literature has concentrated on the second problem. In this work we address the first problem. We restrict attention to structurally unambiguous weighted context-free grammars (SUWCFG) and provide a query learning algorithm for structurally unambiguous probabilistic context-free grammars (SUPCFG). We show that SUWCFG can be represented using co-linear multiplicity tree automata (CMTA), and provide a polynomial learning algorithm that learns CMTAs. We show that the learned CMTA can be converted into a probabilistic grammar, thus providing a complete algorithm for learning a structurally unambiguous probabilistic context free grammar (both the grammar topology and the probabilistic weights) using structured membership queries and structured equivalence queries. We demonstrate the usefulness of our algorithm in learning PCFGs over genomic data.

Proceedings ArticleDOI
16 Nov 2020
TL;DR: A new algorithm is developed that generates positive test cases by covering all edges between pairs of directly connected states in a two-phase breadth-first path search, and extends this algorithm to generate negative test cases, by applying different edge mutation operations during the extraction of test cases from paths.
Abstract: Test case generation from context-free grammars typically uses the grammar's production rules to directly construct words that cover specific sets of derivations. Here, we investigate test case generation by traversing graphs derived from the LR-automata corresponding to the grammars. We develop a new algorithm that generates positive test cases by covering all edges between pairs of directly connected states in a two-phase breadth-first path search. The algorithm iterates over all edges stemming from shift/reduce and reduce/reduce conflicts, using a technique similar to the stack duplication used in GLR parsing. We then extend our algorithm to generate negative (i.e., syntactically invalid) test cases, by applying different edge mutation operations during the extraction of test cases from paths.

Journal ArticleDOI
TL;DR: It is shown that computing the coin-flip measure of an unambiguous context-free language, a quantitative generalisation of universality, is hard for the long-standing open problem SQRTSUM.
Abstract: We study the computational complexity of universality and inclusion problems for unambiguous finite automata and context-free grammars. We observe that several such problems can be reduced to the universality problem for unambiguous context-free grammars. The latter problem has long been known to be decidable and we propose a PSPACE algorithm that works by reduction to the zeroness problem of recurrence equations with convolution. We are not aware of any non-trivial complexity lower bounds. However, we show that computing the coin-flip measure of an unambiguous context-free language, a quantitative generalisation of universality, is hard for the long-standing open problem SQRTSUM.

Posted Content
TL;DR: A probabilistic grammar G that generates T, but not necessarily as a unique element of L(G) = T, is considered, which is more efficient than SLPs for certain texts, both from theoretical and practical points of view.
Abstract: We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string $T$ has been compressed as a context-free grammar $G$ in Chomsky normal form satisfying $L(G) = \{T\}$. Such a grammar is often called a \emph{straight-line program} (SLP). In this paper, we consider a probabilistic grammar $G$ that generates $T$, but not necessarily as a unique element of $L(G)$. In order to recover the original text $T$ unambiguously, we keep both the grammar $G$ and the derivation tree of $T$ from the start symbol in $G$, in compressed form. We show some simple evidence that our proposal is indeed more efficient than SLPs for certain texts, both from theoretical and practical points of view.

Proceedings ArticleDOI
01 Mar 2020
TL;DR: A simple parsing for Odia Language using Context-Free Grammar (CFG) is shown and all the things are being represented as simple tree primarily formalism in context free grammar.
Abstract: In this paper a simple parsing for Odia Language using Context-Free Grammar (CFG) is shown. Parsing is a technique used for building a sentence automatically in the phrases of grammar as well as in lexicon using syntactic analysis. It also includes semantic analysis and syntactic analysis that basically attention on parsing for Odia Language based on Panini method, Context Free Grammar along with top down approach. All the things are being represented as simple tree primarily formalism in context free grammar. Noun Phrase Chunking, Tokenization, Morphological Analysis and Part of Speech Tagging are being described briefly for Odia Language.

Journal ArticleDOI
TL;DR: The combination of the proposed method with asymmetric encryption of the definition of context-free grammar using public key infrastructure, makes it probable that its resistance to attacks will be quite high, because statistical methods that are used in the analysis of natural languages are not susceptible to attacks.
Abstract: In the following article, a proprietary method of anonymisation of identifiable statistical data using context-free probabilistic grammar is proposed. The advantage of this method is that it is sim...

Proceedings ArticleDOI
19 Oct 2020
TL;DR: In this article, the authors present an algorithm for context-free path query processing, which works by looking for localized paths, allowing them to process subgraphs, in contrast to other approaches that have to process the whole graph.
Abstract: Path queries are used to specify paths inside a data graph to match a given pattern. Query languages such as SPARQL usually include support for regular path patterns defined by means of regular expressions. Context-free path queries define a path whose language can be defined by a context-free grammar. This kind of query is interesting in practice in domains such as genetics, data science, or source code analysis. In this paper, we present a novel algorithm for context-free path query processing. Our algorithm works by looking for localized paths, allowing us to process subgraphs, in contrast to other approaches that have to process the whole graph. It also takes any context-free grammar as input, avoiding the use of normal forms that are more problematic in practice. The output of our algorithm provides enough information to reconstruct the paths matching the query. We prove the correctness of our approach and show its runtime and memory complexity. We show the viability of our approach by means of a prototype implemented in Go. We run experiments proposed in recent works, which include both synthetic and real RDF databases. Our algorithm shows some performance gains when compared with other algorithms implemented using single-thread programs.

Journal ArticleDOI
TL;DR: In this paper, a generalized register context-free grammars (GRCFG) is defined by permitting an arbitrary relation on data values in the guard expression of a production rule.
Abstract: Register context-free grammars (RCFG) is an extension of context-free grammars to handle data values in a restricted way. This paper first introduces register type as a finite representation of the register contents and shows some properties of RCFG. Next, generalized RCFG (GRCFG) is defined by permitting an arbitrary relation on data values in the guard expression of a production rule. We extend register type to GRCFG and introduce two properties of GRCFG, the simulation property and the type oracle. We then show that \(\varepsilon \)-rule removal is possible and the emptiness and membership problems are EXPTIME solvable for GRCFG that satisfy these two properties.

01 Jan 2020
TL;DR: This thesis presents a novel approach to the Procedural Content Generation (PCG) of both maze and dungeon environments by decomposing the problem of level generation into a series of stages which begins with the production of macro-level functional structures and ends with micro-level aesthetic details.
Abstract: This thesis presents a novel approach to the Procedural Content Generation (PCG) of both maze and dungeon environments. The solution we propose in this thesis borrows techniques from both Procedural Content Generation via Machine Learning as well as Constructive PCG methods. The approach we take involves decomposing the problem of level generation into a series of stages which begins with the production of macro-level functional structures and ends with micro-level aesthetic details; specifically, we train a Deep Convolutional Neural Network to produce high-quality mazes, which in turn, are transformed into the rooms of larger dungeon levels using a constructive algorithm. For our dungeon’s micro-level details, we use a contextfree grammar for the instantiation of interactable puzzle elements, and an n-gram model for decorating our dungeon’s entrance rooms. This unique combination of methods successfully generates a large number of visually impressive game levels without compromising on any desirable PCG metrics such as speed, reliability, controllability, expressivity, or believability.

Proceedings ArticleDOI
Isamu Furuya1
25 Mar 2020
TL;DR: The authors proposed a compact encoding method for MR-RePair and showed its effectiveness through comparative experiments and extended it to run-length context free grammar and designed a novel variant for it called RL-MR-Re-Pair.
Abstract: The goal of grammar compression is to construct a small sized context free grammar which uniquely generates the input text data. Among grammar compression methods, RePair is known for its good practical compression performance. MR-RePair was recently proposed as an improvement to RePair for constructing small-sized context free grammar for repetitive text data. However, a compact encoding scheme has not been discussed for MR-RePair. We propose a practical encoding method for MR-RePair and show its effectiveness through comparative experiments. Moreover, we extend MR-RePair to run-length context free grammar and design a novel variant for it called RL-MR-RePair. We experimentally demonstrate that a compression scheme consisting of RL-MR-RePair and the proposed encoding method show good performance on real repetitive datasets.


Proceedings Article
01 May 2020
TL;DR: A tool that segments sentences into tree structures to detect this type of recursive structure is introduced and it is shown that for certain sentence categories, which can be determined automatically, improvements in German dependency parsing can be achieved using the segmenter for preprocessing.
Abstract: Current sentence boundary detectors split documents into sequentially ordered sentences by detecting their beginnings and ends. Sentences, however, are more deeply structured even on this side of constituent and dependency structure: they can consist of a main sentence and several subordinate clauses as well as further segments (e.g. inserts in parentheses); they can even recursively embed whole sentences and then contain multiple sentence beginnings and ends. In this paper, we introduce a tool that segments sentences into tree structures to detect this type of recursive structure. To this end, we retrain different constituency parsers with the help of modified training data to transform them into sentence segmenters. With these segmenters, documents are mapped to sequences of sentence-related “logical document structures”. The resulting segmenters aim to improve downstream tasks by providing additional structural information. In this context, we experiment with German dependency parsing. We show that for certain sentence categories, which can be determined automatically, improvements in German dependency parsing can be achieved using our segmenter for preprocessing. The assumption suggests that improvements in other languages and tasks can be achieved.