Showing papers on "Context-free grammar published in 2007"

PDF

Open Access

Proceedings Article•DOI•

[...]

Conor Ryan¹•Institutions (1)

07 Jul 2007

TL;DR: This tutorial gives a brief introduction to Backus Naur Form Grammars and a background into the use of grammars with Genetic Programming, before describing the inner workings of Grammatical Evolution and some of the more commonly used extensions.

...read moreread less

Abstract: Grammatical Evolution is an automatic programming system that is a form of Genetic Programming that uses grammars to evolve structures. These structures can be in any form that can be specified using a grammar, including computer languages, graphs and neural networks. When evolving computer languages, multiple types can be handled in a completely transparent manner.This tutorial gives a brief introduction to Backus Naur Form grammars and a background into the use of grammars with Genetic Programming, before describing the inner workings of Grammatical Evolution and some of the more commonly used extensions.

...read moreread less

344 citations

Proceedings Article•

Bayesian Inference for PCFGs via Markov Chain Monte Carlo

[...]

Mark Johnson¹, Thomas L. Griffiths², Sharon Goldwater•Institutions (2)

Brown University¹, University of California, Berkeley²

01 Apr 2007

TL;DR: Two Markov chain Monte Carlo algorithms for Bayesian inference of probabilistic context free grammars (PCFGs) from terminal strings are presented, providing an alternative to maximum-likelihood estimation using the Inside-Outside algorithm.

...read moreread less

Abstract: This paper presents two Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference of probabilistic context free grammars (PCFGs) from terminal strings, providing an alternative to maximum-likelihood estimation using the Inside-Outside algorithm. We illustrate these methods by estimating a sparse grammar describing the morphology of the Bantu language Sesotho, demonstrating that with suitable priors Bayesian techniques can infer linguistic structure in situations where maximum likelihood methods such as the Inside-Outside algorithm only produce a trivial grammar.

...read moreread less

214 citations

Proceedings Article•DOI•

Directed test generation using symbolic grammars

[...]

Rupak Majumdar¹, Ru-Gang Xu¹•Institutions (1)

University of California, Los Angeles¹

05 Nov 2007

TL;DR: CESE, a tool that combines exhaustive enumeration of test inputs from a structured domain with symbolic execution driven test generation, and symbolic grammars, where some original tokens are replaced with symbolic constants, to target programs whose valid inputs are determined by some context free grammar.

...read moreread less

Abstract: We present CESE, a tool that combines exhaustive enumeration of test inputs from a structured domain with symbolic execution driven test generation. We target programs whose valid inputs are determined by some context free grammar. We abstract the concrete input syntax with symbolic grammars, where some original tokens are replaced with symbolic constants. This reduces the set of input strings that must be enumerated exhaustively. For each enumerated input string, which may contain symbolic constants, symbolic execution based test generation instantiates the constants based on program execution paths. The "template" generated by enumerating valid strings reduces the burden on the symbolic execution to generate syntactically valid inputs and helps exercise interesting code paths. Together, symbolic grammars provide a link between exhaustive enumeration of valid inputs and execution-directed symbolic test generation Preliminary experiments with CESE show that the combination achieves better coverage than both pure enumerative test generation and pure directed symbolic test generation, in orders of magnitude less time and number of generated inputs. In addition, CESE is able to automatically generate inputs that achieve coverage within 10% of manually constructed tests.

...read moreread less

84 citations

Book Chapter•DOI•

Analyzing ambiguity of context-free grammars

[...]

Claus Brabrand¹, Robert Giegerich², Anders Møller¹•Institutions (2)

Aarhus University¹, Bielefeld University²

16 Jul 2007

TL;DR: This work observes that there is a simple linguistic characterization of the grammar ambiguity problem, and shows how to exploit this to conservatively approximate the problem based on local regular approximations and grammar unfoldings.

...read moreread less

Abstract: It has been known since 1962 that the ambiguity problem for context-free grammars is undecidable. Ambiguity in context-free grammars is a recurring problem in language design and parser generation, as well as in applications where grammars are used as models of real-world physical structures. We observe that there is a simple linguistic characterization of the grammar ambiguity problem, and we show how to exploit this to conservatively approximate the problem based on local regular approximations and grammar unfoldings. As an application, we consider grammars that occur in RNA analysis in bioinformatics, and we demonstrate that our static analysis of context-free grammars is sufficiently precise and efficient to be practically useful.

...read moreread less

75 citations

Proceedings Article•

Learning and inference for hierarchically split PCFGs

[...]

Slav Petrov¹, Dan Klein¹•Institutions (1)

University of California, Berkeley¹

22 Jul 2007

TL;DR: This work describes a method in which a minimal grammar is hierarchically refined using EM to give accurate, compact grammars, yet the resulting parser gives the best published accuracies on several languages, as well as the best generative parsing numbers in English.

...read moreread less

Abstract: Treebank parsing can be seen as the search for an optimally refined grammar consistent with a coarse training treebank. We describe a method in which a minimal grammar is hierarchically refined using EM to give accurate, compact grammars. The resulting grammars are extremely compact compared to other high-performance parsers, yet the parser gives the best published accuracies on several languages, as well as the best generative parsing numbers in English. In addition, we give an associated coarse-to-fine inference scheme which vastly improves inference time with no loss in test set accuracy.

...read moreread less

67 citations

Book Chapter•DOI•

Conjunctive Grammars Can Generate Non-regular Unary Languages

[...]

Artur Jeż

03 Jul 2007

TL;DR: A negative answer is given, contrary to the conjectured positive one, by constructing a conjunctive grammar for the language \(\{ a^{4^{n}} : n \in \mathbb{N} \}\).

...read moreread less

Abstract: Conjunctive grammars were introduced by A. Okhotin in [1] as a natural extension of context-free grammars with an additional operation of intersection in the body of any production of the grammar. Several theorems and algorithms for context-free grammars generalize to the conjunctive case. Still some questions remained open. A. Okhotin posed nine problems concerning those grammars. One of them was a question, whether a conjunctive grammar over unary alphabet can generate only regular languages. We give a negative answer, contrary to the conjectured positive one, by constructing a conjunctive grammar for the language \(\{ a^{4^{n}} : n \in \mathbb{N} \}\). We then generalise this result—for every set of numbers L such that their representation in some k-ary system is regular set we show that \(\{ a^{k^{n}} : n \in L \}\) is generated by some conjunctive grammar over unary alphabet.

...read moreread less

61 citations

Journal Article•DOI•

Circular reference attributed grammars — their evaluation and applications

[...]

Eva Magnusson¹, Görel Hedin¹•Institutions (1)

Lund University¹

01 Aug 2007-Science of Computer Programming

TL;DR: This paper exemplifies with the specification and computation of the nullable, first, and follow sets used in parser construction, a problem which is highly recursive and normally programmed by hand using an iterative algorithm, and presents a general demand-driven evaluation algorithm for CRAGs.

...read moreread less

60 citations

Journal Article•DOI•

Christiansen Grammar Evolution: Grammatical Evolution With Semantics

[...]

Alfonso Ortega¹, M. de la Cruz¹, Manuel Alfonseca¹•Institutions (1)

Autonomous University of Madrid¹

01 Feb 2007-IEEE Transactions on Evolutionary Computation

TL;DR: This paper describes Christiansen grammar evolution (CGE), a new evolutionary automatic programming algorithm that extends standard grammar evolution by replacing context-free grammars by Christiansen Grammars.

...read moreread less

Abstract: This paper describes Christiansen grammar evolution (CGE), a new evolutionary automatic programming algorithm that extends standard grammar evolution (GE) by replacing context-free grammars by Christiansen grammars. GE only takes into account syntactic restrictions to generate valid individuals. CGE adds semantics to ensure that both semantically and syntactically valid individuals are generated. It is empirically shown that our approach improves GE performance and even allows the solution of some problems are difficult to tackle by GE

...read moreread less

59 citations

Patent•

Web-based proofing and usage guidance

[...]

Chris Brockett¹, William B. Dolan¹, Michael Gamon¹, Jianfeng Gao¹, Lucy Vanderwende¹, Hsiao-Wen Hon¹, Ming Zhou¹, Gary Kacmarcik¹, Alexandre Klementiev¹ - Show less +5 more•Institutions (1)

Microsoft¹

28 Feb 2007

TL;DR: In this article, a system is disclosed for checking grammar and usage using a flexible portfolio of different mechanisms, and automatically providing a variety of different examples of standard usage, selected from analogous Web content.

...read moreread less

Abstract: A system is disclosed for checking grammar and usage using a flexible portfolio of different mechanisms, and automatically providing a variety of different examples of standard usage, selected from analogous Web content. The system can be used for checking the grammar and usage in any application that involves natural language text, such as word processing, email, and presentation applications. The grammar and usage can be evaluated using several complementary evaluation modules, which may include one based on a trained classifier, one based on regular expressions, and one based on comparative searches of the Web or a local corpus. The evaluation modules can provide a set of suggested alternative segments with corrected grammar and usage. A followup, screened Web search based on the alternative segments, in context, may provide several different in-context examples of proper grammar and usage that the user can consider and select from.

...read moreread less

59 citations

Journal Article•DOI•

Weighted and probabilistic context-free grammars are equally expressive

[...]

Noah A. Smith¹, Noah A. Smith², Mark Johnson¹, Mark Johnson²•Institutions (2)

Brown University¹, Carnegie Mellon University²

01 Dec 2007-Computational Linguistics

TL;DR: Any parsing or labeling accuracy improvement from conditional estimation of WCFGs or conditional random fields (CRFs) over joint estimation of PCFGs or hidden Markov models (HMMs) is due to the estimation procedure rather than the change in model class, becausePCFGs and HMMs are exactly as expressive as W CFGs and chain-structured CRFs, respectively.

...read moreread less

Abstract: This article studies the relationship between weighted context-free grammars (WCFGs), where each production is associated with a positive real-valued weight, and probabilistic context-free grammars (PCFGs), where the weights of the productions associated with a nonterminal are constrained to sum to one. Because the class of WCFGs properly includes the PCFGs, one might expect that WCFGs can describe distributions that PCFGs cannot. However, Z. Chi (1999, Computational Linguistics, 25(1):131--160) and S. P. Abney, D. A. McAllester, and P. Pereira (1999, In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 542--549, College Park, MD) proved that every WCFG distribution is equivalent to some PCFG distribution. We extend their results to conditional distributions, and show that every WCFG conditional distribution of parses given strings is also the conditional distribution defined by some PCFG, even when the WCFG's partition function diverges. This shows that any parsing or labeling accuracy improvement from conditional estimation of WCFGs or conditional random fields (CRFs) over joint estimation of PCFGs or hidden Markov models (HMMs) is due to the estimation procedure rather than the change in model class, because PCFGs and HMMs are exactly as expressive as WCFGs and chain-structured CRFs, respectively.

...read moreread less

54 citations

Book•

Automata, computability and complexity : theory and applications

[...]

Elaine Rich

01 Jan 2007

TL;DR: This chapter discusses automata theory in the context of finite state machines, which is concerned with Turing Machines, and its applications in linguistics, where Turing Machines are concerned with language recognition.

...read moreread less

Abstract: PART I: INTRODUCTION 1 Why Study Automata Theory? 2 Review of Mathematical Concepts 2.1 Logic 2.2 Sets 2.3 Relations 2.4 Functions 2.5 Closures 2.6 Proof Techniques 2.7 Reasoning about Programs 2.8 References 3 Languages and Strings 3.1 Strings 3.2 Languages 4 The Big Picture: A Language Hierarchy 4.1 Defining the Task: Language Recognition 4.2 The Power of Encoding 4.3 A Hierarchy of Language Classes 5 Computation 5.1 Decision Procedures 5.2 Determinism and Nondeterminism 5.3 Functions on Languages and Programs PART II: FINITE STATE MACHINES AND REGULAR LANGUAGES 6 Finite State Machines 6.2 Deterministic Finite State Machines 6.3 The Regular Languages 6.4 Programming Deterministic Finite State Machines 6.5 Nondeterministic FSMs 6.6 Interpreters for FSMs 6.7 Minimizing FSMs 6.8 Finite State Transducers 6.9 Bidirectional Transducers 6.10 Stochastic Finite Automata 6.11 Finite Automata, Infinite Strings: Buchi Automata 6.12 Exercises 7 Regular Expressions 7.1 What is a Regular Expression? 7.2 Kleene's Theorem 7.3 Applications of Regular Expressions 7.4 Manipulating and Simplifying Regular Expressions 8 Regular Grammars 8.1 Definition of a Regular Grammar 8.2 Regular Grammars and Regular Languages 8.3 Exercises 9 Regular and Nonregular Languages 9.1 How Many Regular Languages Are There? 9.2 Showing That a Language Is Regular.124 9.3 Some Important Closure Properties of Regular Languages 9.4 Showing That a Language is Not Regular 9.5 Exploiting Problem-Specific Knowledge 9.6 Functions on Regular Languages 9.7 Exercises 10 Algorithms and Decision Procedures for Regular Languages 10.1 Fundamental Decision Procedures 10.2 Summary of Algorithms and Decision Procedures for Regular Languages 10.3 Exercises 11 Summary and References PART III: CONTEXT-FREE LANGUAGES AND PUSHDOWN AUTOMATA 144 12 Context-Free Grammars 12.1 Introduction to Grammars 12.2 Context-Free Grammars and Languages 12.3 Designing Context-Free Grammars 12.4 Simplifying Context-Free Grammars 12.5 Proving That a Grammar is Correct 12.6 Derivations and Parse Trees 12.7 Ambiguity 12.8 Normal Forms 12.9 Stochastic Context-Free Grammars 12.10 Exercises 13 Pushdown Automata 13.1 Definition of a (Nondeterministic) PDA 13.2 Deterministic and Nondeterministic PDAs 13.3 Equivalence of Context-Free Grammars and PDAs 13.4 Nondeterminism and Halting 13.5 Alternative Definitions of a PDA 13.6 Exercises 14 Context-Free and Noncontext-Free Languages 14.1 Where Do the Context-Free Languages Fit in the Big Picture? 14.2 Showing That a Language is Context-Free 14.3 The Pumping Theorem for Context-Free Languages 14.4 Some Important Closure Properties of Context-Free Languages 14.5 Deterministic Context-Free Languages 14.6 Other Techniques for Proving That a Language is Not Context-Free 14.7 Exercises 15 Algorithms and Decision Procedures for Context-Free Languages 15.1 Fundamental Decision Procedures 15.2 Summary of Algorithms and Decision Procedures for Context-Free Languages 16 Context-Free Parsing 16.1 Lexical Analysis 16.2 Top-Down Parsing 16.3 Bottom-Up Parsing 16.4 Parsing Natural Languages 16.5 Stochastic Parsing 16.6 Exercises 17 Summary and References PART IV: TURING MACHINES AND UNDECIDABILITY 18 Turing Machines 18.1 Definition, Notation and Examples 18.2 Computing With Turing Machines 18.3 Turing Machines: Extensions and Alternative Definitions 18.4 Encoding Turing Machines as Strings 18.5 The Universal Turing Machine 18.6 Exercises 19 The Church-Turing 19.1 The Thesis 19.2 Examples of Equivalent Formalisms 20 The Unsolvability of the Halting Problem 20.1 The Language H is Semidecidable but Not Decidable 20.2 Some Implications of the Undecidability of H 20.3 Back to Turing, Church, and the Entscheidungsproblem 21 Decidable and Semidecidable Languages 21.2 Subset Relationships between D and SD 21.3 The Classes D and SD Under Complement 21.4 Enumerating a Language 21.5 Summary 21.6 Exercises 22 Decidability and Undecidability Proofs 22.1 Reduction 22.2 Using Reduction to Show that a Language is Not Decidable 22.3 Rice's Theorem 22.4 Undecidable Questions About Real Programs 22.5 Showing That a Language is Not Semidecidable 22.6 Summary of D, SD/D and (R)SD Languages that Include Turing Machine Descriptions 22.7 Exercises 23 Undecidable Languages That Do Not Ask Questions about Turing Machines 23.1 Hilbert's 10th Problem 23.2 Post Correspondence Problem 23.3 Tiling Problems 23.4 Logical Theories 23.5 Undecidable Problems about Context-Free Languages APPENDIX C: HISTORY, PUZZLES, AND POEMS 43 Part I: Introduction 43.1 The 15-Puzzle Part II: Finite State Machines and Regular Languages 44.1 Finite State Machines Predate Computers 44.2 The Pumping Theorem Inspires Poets REFERENCES INDEX Appendices for Automata, Computability and Complexity: Theory and Applications: * Math Background* Working with Logical Formulas* Finite State Machines and Regular Languages* Context-Free Languages and PDAs* Turing Machines and Undecidability* Complexity* Programming Languages and Compilers* Tools for Programming, Databases and Software Engineering* Networks* Security* Computational Biology* Natural Language Processing* Artificial Intelligence and Computational Reasoning* Art & Entertainment: Music & Games* Using Regular Expressions* Using Finite State Machines and Transducers* Using Grammars

...read moreread less

Journal Article•DOI•

Event-driven grammars: relating abstract and concrete levels of visual languages

[...]

Esther Guerra¹, Juan de Lara•Institutions (1)

Charles III University of Madrid¹

29 Mar 2007-Software and Systems Modeling

TL;DR: This work introduces event-driven Grammars, a kind of graph grammars that are especially suited for visual modelling environments generated by meta-modelling and their combination with triple graph transformation systems.

...read moreread less

Abstract: In this work we introduce event-driven grammars, a kind of graph grammars that are especially suited for visual modelling environments generated by meta-modelling. Rules in these grammars may be triggered by user actions (such as creating, editing or connecting elements) and in their turn may trigger other user-interface events. Their combination with triple graph transformation systems allows constructing and checking the consistency of the abstract syntax graph while the user is building the concrete syntax model, as well as managing the layout of the concrete syntax representation. As an example of these concepts, we show the definition of a modelling environment for UML sequence diagrams. A discussion is also presented of methodological aspects for the generation of environments for visual languages with multiple views, its connection with triple graph grammars, the formalization of the latter in the double pushout approach and its extension with an inheritance concept.

...read moreread less

Proceedings Article•DOI•

Directed test generation using symbolic grammars

[...]

Rupak Majumdar¹, Ru-Gang Xu¹•Institutions (1)

University of California, Los Angeles¹

03 Sep 2007

TL;DR: CESI, an algorithm that combines exhaustive enumeration of test inputs from a structured domain with symbolic execution driven test generation, and symbolic grammars, where the original tokens are replaced with symbolic constants, that link enumerative grammar-based input generation with symbolic directed testing.

...read moreread less

Abstract: We present CESI, an algorithm that combines exhaustive enumeration of test inputs from a structured domain with symbolic execution driven test generation. CESI is a hybrid of two predominant techniques: specification-based enumerative test generation (which exhaustively generates all possible inputs satisfying some constraint) and symbolic directed test generation (which explores program paths based on symbolic path constraint solving). We target programs whose valid inputs are determined by some context free grammar. We introduce symbolic grammars, where the original tokens are replaced with symbolic constants, that link enumerative grammar-based input generation with symbolic directed testing. Symbolic grammars abstract the concrete input syntax, thus reducing the set of input strings that must be enumerated exhaustively. For each enumerated input string, which may contain symbolic constants, symbolic execution based test generation instantiates the constants based on program execution paths. The "template" generated by enumerating valid strings reduces the burden on the symbolic execution to generate syntactically valid inputs and hence exercise interesting code paths. Together, symbolic grammars provide a link between exhaustive enumeration of valid inputs and execution-directed symbolic test generation. In preliminary experiments, CESI is better than if both enumerative and symbolic techniques are used alone.

...read moreread less

Book Chapter•DOI•

Conservative ambiguity detection in context-free grammars

[...]

Sylvain Schmitz¹•Institutions (1)

University of Nice Sophia Antipolis¹

09 Jul 2007

TL;DR: A safe, conservative approach is presented, where the approximations cannot result in overlooked ambiguous cases and the complexity of the verification is analyzed, and formal comparisons are provided with several other ambiguity detection methods.

...read moreread less

Abstract: The ability to detect ambiguities in context-free grammars is vital for their use in several fields, but the problem is undecidable in the general case. We present a safe, conservative approach, where the approximations cannot result in overlooked ambiguous cases. We analyze the complexity of the verification, and provide formal comparisons with several other ambiguity detection methods.

...read moreread less

Grammar Engineering for Linguistic Hypothesis Testing

[...]

Emily M. Bender

01 Jan 2007

TL;DR: It is argued that the tools and techniques of grammar engineering are used as a means to take thevelopment and evaluation of syntactic hypoth-esis testing to a new level and theoretical ideas are validated through the de-velopment of explicit grammars which can relate strings from some fragment.

...read moreread less

Abstract: In this paper , I argue that the tools and techniques of grammar engineeringpro vide a means to tak e the de velopment and evaluation of syntactic hypoth-esis testing to a new level. Grammar engineering is the process of creatingmachine-readable implementations of formal grammars. T raditionally , lin-guistic hypotheses are encoded as statements within a grammatical theory andtested by collecting rele vant examples and manually verifying that the gram-mars correctly predict the grammaticality and linguistic structure of thoseexamples. Computerized implementations of their grammars allo w linguiststo more efÞ ciently and effecti vely test hypotheses, for tw o reasons: First, lan-guages are made up of man y subsystems with comple x interactions. Linguistsgenerally focus on just one subsystem at a time, yet the predictions of an yparticular analysis cannot be calculated independently of the interacting sub-systems. W ith implemented grammars, the computer can track the effects ofall aspects of the implementation while the linguist focuses on de veloping justone. Second, automated application of grammars to test suites and naturallyoccurring data allo ws for much more thorough testing of linguistic analysesNagainst thousands as opposed to tens of examples and including examples notanticipated by the linguist.This w ork is situated within the Montago vian tradition of the Omethodof fragmentsO (Montague, 1974, P artee, 1979, Gazdar et al., 1985). In thismethodology ,theoretical ideas are validated (and extended) through the de-velopment of explicit grammars which can relate strings from some fragment

...read moreread less

Journal Article•DOI•

Modular Grammar Engineering in GF

[...]

Aarne Ranta¹•Institutions (1)

Chalmers University of Technology¹

22 Aug 2007-Research on Language and Computation

TL;DR: The goal is to make it possible for linguistically untrained programmers to write linguistically correct application grammars encoding the semantics of special domains, and the type system of GF guarantees that grammaticality is preserved.

...read moreread less

Abstract: The Grammatical Framework GF is a grammar formalism designed for multilingual grammars. A multilingual grammar has a shared representation, called abstract syntax, and a set of concrete syntaxes that map the abstract syntax to different languages. A GF grammar consists of modules, which can share code through inheritance, but which can also hide information to achieve division of labour between grammarians working on different modules. The goal is to make it possible for linguistically untrained programmers to write linguistically correct application grammars encoding the semantics of special domains. Such programmers can rely on resource grammars, written by linguists, which play the role of standard libraries. Application grammarians use resource grammars through abstract interfaces, and the type system of GF guarantees that grammaticality is preserved. The ongoing GF resource grammar project provides resource grammars for ten languages. In addition to their use as libraries, resource grammars serve as an experiment showing how much grammar code can be shared between different languages.

...read moreread less

Proceedings Article•DOI•

Interface grammars for modular software model checking

[...]

Graham Hughes¹, Tevfik Bultan¹•Institutions (1)

University of California¹

09 Jul 2007

TL;DR: This work has built an compiler that takes the interface grammar for a component as input and generates a stub for that component, which can be used to replace that component during state space exploration, or to provide an executable environment for the component under verification.

...read moreread less

Abstract: We propose an interface specification language based on grammars for modular software model checking. In our interface specification language, component interfaces are specified as context free grammars. An interface grammar for a component specifies the sequences of method invocations that are allowed by that component. Using interface grammars one can specify nested call sequences that cannot be specified using interface specification formalisms that rely on finite state machines. Moreover, our interface grammars allow specification of semantic predicates and actions, which are Java code segments that can be used to express additional interface constraints. We have built an interface compiler that takes the interface grammar for a component as input and generates a stub for that component. The resulting stub is a table-driven parser generated from the input interface grammar. Invocation of a method within the component becomes the lookahead symbol for the stub/parser. The stub/parser uses a parser stack, the lookahead, and a parse table to guide the parsing. The semantic predicates and semantic actions that appear in the right hand sides of the production rules are executed when they appear at the top of the stack. We conducted a case study by writing an interface grammar for the Enterprise JavaBeans (EJB) persistence interface. Using our interface compiler we automatically generated an EJB stub using the EJB interface grammar. We used the JPF model checker to check EJB clients using this automatically generated EJB stub. Our results show that EJB clients can be verified efficiently using our approach.

...read moreread less

Journal Article•

Refining the nonterminal complexity of graph-controlled, programmed, and matrix grammars

[...]

Henning Fernau¹, Rudolf Freund², Marion Oswald², Klaus Reinhardt³•Institutions (3)

University of Trier¹, Vienna University of Technology², University of Tübingen³

01 Jan 2007-Journal of Automata, Languages and Combinatorics

TL;DR: It is proved that every recursively enumerable language can be generated by a graph-controlled grammar with only two nonterminal symbols when both symbols are used in the appearance checking mode.

...read moreread less

Abstract: We refine the classical notion of the nonterminal complexity of graph-controlled grammars, programmed grammars, and matrix grammars by also counting, in addition, the number of nonterminal symbols that are actually used in the appearance checking mode. We prove that every recursively enumerable language can be generated by a graph-controlled grammar with only two nonterminal symbols when both symbols are used in the appearance checking mode. This result immediately implies that programmed grammars with three nonterminal symbols where two of them are used in the appearance checking mode as well as matrix grammars with three nonterminal symbols all of them used in the appearance checking mode are computationally complete. Moreover, we prove that matrix grammars with four nonterminal symbols with only two of them being used in the appearance checking mode are computationally complete, too. On the other hand, every language is recursive if it is generated by a graph-controlled grammar with an arbitrary number of nonterminal symbols but only one of the nonterminal symbols being allowed to be used in the appearance checking mode. This implies, in particular, that the result proving the computational completeness of graph-controlled grammars with two nonterminal symbols and both of them being used in the appearance checking mode is already optimal with respect to the overall number of nonterminal symbols as well as with respect to the number of nonterminal symbols used in the appearance checking mode, too. Finally, we also investigate in more detail the computational power of several language families which are generated by graph-controlled, programmed grammars or matrix grammars, respectively, with a very small number of nonterminal symbols and therefore are proper subfamilies of the family of recursively enumerable languages.

...read moreread less

Proceedings Article•DOI•

Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars

[...]

Richard A. Frost¹, Rahmatullah Hafiz¹, Paul Callaghan²•Institutions (2)

University of Windsor¹, Durham University²

23 Jun 2007

TL;DR: This paper combines aspects of previous approaches and presents a method by which parsers can be built as modular and efficient executable specifications of ambiguous grammars containing unconstrained left recursion.

...read moreread less

Abstract: In functional and logic programming, parsers can be built as modular executable specifications of grammars, using parser combinators and definite clause grammars respectively. These techniques are based on top-down backtracking search. Commonly used implementations are inefficient for ambiguous languages, cannot accommodate left-recursive grammars, and require exponential space to represent parse trees for highly ambiguous input. Memoization is known to improve efficiency, and work by other researchers has had some success in accommodating left recursion. This paper combines aspects of previous approaches and presents a method by which parsers can be built as modular and efficient executable specifications of ambiguous grammars containing unconstrained left recursion.

...read moreread less

Journal Article•DOI•

Molecular query language (MQL)--a context-free grammar for substructure matching.

[...]

Ewgenij Proschak¹, Jörg K. Wegner¹, Andreas Schüller¹, and Gisbert Schneider¹, Uli Fechner¹ - Show less +1 more•Institutions (1)

Goethe University Frankfurt¹

11 Jan 2007-Journal of Chemical Information and Modeling

TL;DR: A Java library for substructure matching that features easy-to-read syntax and extensibility and converts the found match from the internal format of MQL to the format of the external toolkit.

...read moreread less

Abstract: We have developed a Java library for substructure matching that features easy-to-read syntax and extensibility. This molecular query language (MQL) is grounded on a context-free grammar, which allows for straightforward modification and extension. The formal description of MQL is provided in this paper. Molecule primitives are atoms, bonds, properties, branching, and rings. User-defined features can be added via a Java interface. In MQL, molecules are represented as graphs. Substructure matching was implemented using the Ullmann algorithm because of favorable run-time performance. The Ullmann algorithm carries out a fast subgraph isomorphism search by combining backtracking with effective forward checking. MQL software design was driven by the aim to facilitate the use of various cheminformatics toolkits. Two Java interfaces provide a bridge from our MQL package to an external toolkit: the first one provides the matching rules for every feature of a particular toolkit; the second one converts the found m...

...read moreread less

Journal Article•DOI•

Executable Grammars in Newspeak

[...]

Gilad Bracha¹•Institutions (1)

Cadence Design Systems¹

01 Nov 2007-Electronic Notes in Theoretical Computer Science

TL;DR: The design and implementation of a parser combinator library in Newspeak, a new language in the Smalltalk family, which allows the grammar to be specified as a separate class or mixin, independent of tools that rely upon it such as parsers, syntax colorizers etc.

...read moreread less

Journal Article•DOI•

Learning deterministic context free grammars: The Omphalos competition

[...]

Alexander Clark¹•Institutions (1)

Royal Holloway, University of London¹

01 Jan 2007-Machine Learning

TL;DR: A context-free grammatical inference algorithm operating on positive data only is described, which integrates an information theoretic constituent likelihood measure together with more traditional heuristics based on substitutability and frequency.

...read moreread less

Abstract: This paper describes the winning entry to the Omphalos context free grammar learning competition. We describe a context-free grammatical inference algorithm operating on positive data only, which integrates an information theoretic constituent likelihood measure together with more traditional heuristics based on substitutability and frequency. The competition is discussed from the perspective of a competitor. We discuss a class of deterministic grammars, the Non-terminally Separated (NTS) grammars, that have a property relied on by our algorithm, and consider the possibilities of extending the algorithm to larger classes of languages.

...read moreread less

Patent•

Systems and methods for regularly approximating context-free grammars through transformation

[...]

Mehryar Mohri¹•Institutions (1)

AT&T¹

18 Sep 2007

TL;DR: In this paper, the output rules are output in a specific format that specifies, for each rule, the lefthand non-terminal symbol, a single right-hand nonterminal symbols, and zero, one or more terminal symbols.

...read moreread less

Abstract: Context-free grammars generally comprise a large number of rules, where each rule defines how a string of symbols is generated from a different series of symbols. While techniques for creating finite-state automata from the rules of context-free grammars exist, these techniques require an input grammar to be strongly regular. Systems and methods that convert the rules of a context-free grammar into a strongly regular grammar include transforming each input rule into a set of output rules that approximate the input rule. The output rules are all right- or left-linear and are strongly regular. In various exemplary embodiments, the output rules are output in a specific format that specifies, for each rule, the left-hand non-terminal symbol, a single right-hand non-terminal symbol, and zero, one or more terminal symbols. If the input context-free grammar rule is weighted, the weight of that rule is distributed and assigned to the output rules.

...read moreread less

Proceedings Article•DOI•

Factorization of synchronous context-free grammars in linear time

[...]

Hao Zhang¹, Daniel Gildea¹•Institutions (1)

University of Rochester¹

26 Apr 2007

TL;DR: By modifying the algorithm of Uno and Yagiura (2000) for the closely related problem of finding all common intervals of two permutations, this paper achieves a linear time algorithm for the permutation factorization problem.

...read moreread less

Abstract: Factoring a Synchronous Context-Free Grammar into an equivalent grammar with a smaller number of nonterminals in each rule enables synchronous parsing algorithms of lower complexity. The problem can be formalized as searching for the tree-decomposition of a given permutation with the minimal branching factor. In this paper, by modifying the algorithm of Uno and Yagiura (2000) for the closely related problem of finding all common intervals of two permutations, we achieve a linear time algorithm for the permutation factorization problem. We also use the algorithm to analyze the maximum SCFG rule length needed to cover hand-aligned data from various language pairs.

...read moreread less

Journal Article•DOI•

Recursive descent parsing for Boolean grammars

[...]

Alexander Okhotin¹•Institutions (1)

University of Turku¹

26 Jun 2007-Acta Informatica

TL;DR: The recursive descent parsing method for the context-free grammars is extended for their generalization, Boolean Grammars, which include explicit set-theoretic operations in the formalism of rules and which are formally defined by language equations.

...read moreread less

Abstract: The recursive descent parsing method for the context-free grammars is extended for their generalization, Boolean grammars, which include explicit set-theoretic operations in the formalism of rules and which are formally defined by language equations. The algorithm is applicable to a subset of Boolean grammars. The complexity of a direct implementation varies between linear and exponential, while memoization keeps it down to linear.

...read moreread less

Proceedings Article•DOI•

Learning Dynamic Event Descriptions in Image Sequences

[...]

Harini Veeraraghavan¹, Nikolaos Papanikolopoulos¹, Paul Schrater¹•Institutions (1)

University of Minnesota¹

17 Jun 2007

TL;DR: This work introduces a novel method for representing and classifying events in video sequences using reversible context-free grammars and demonstrates the efficacy of the learning algorithm and the event detection method applied to traffic video sequences.

...read moreread less

Abstract: Automatic detection of dynamic events in video sequences has a variety of applications including visual surveillance and monitoring, video highlight extraction, intelligent transportation systems, video summarization, and many more. Learning an accurate description of the various events in real-world scenes is challenging owing to the limited user-labeled data as well as the large variations in the pattern of the events. Pattern differences arise either due to the nature of the events themselves such as the spatio-temporal events or due to missing or ambiguous data interpretation using computer vision methods. In this work, we introduce a novel method for representing and classifying events in video sequences using reversible context-free grammars. The grammars are learned using a semi-supervised learning method. More concretely, by using the classification entropy as a heuristic cost function, the grammars are iteratively learned using a search method. Experimental results demonstrating the efficacy of the learning algorithm and the event detection method applied to traffic video sequences are presented.

...read moreread less

Book Chapter•DOI•

Creation myths of generative grammar and the mathematics of syntactic structures

[...]

Geoffrey K. Pullum¹•Institutions (1)

University of Edinburgh¹

28 Jul 2007

TL;DR: This paper offers a retrospective analysis and evaluation of Noam Chomsky's Syntactic Structures (Chomsky) and shows that none of these things are true.

...read moreread less

Abstract: Syntactic Structures (Chomsky [6]) is widely believed to have laid the foundations of a cognitive revolution in linguistic science, and to have presented (i) the first use in linguistics of powerful new ideas regarding grammars as generative systems, (ii) a proof that English was not a regular language, (iii) decisive syntactic arguments against context-free phrase structure grammar description, and (iv) a demonstration of how transformational rules could provide a formal solution to those problems. None of these things are true. This paper offers a retrospective analysis and evaluation.

...read moreread less

Exploring Context-Sensitivity in Spatial Intention Recognition.

[...]

Peter Kiefer¹, Christoph Schlieder¹•Institutions (1)

University of Bamberg¹

01 Jan 2007

TL;DR: It is argued that context free grammars are not sufficiently expressive to handle important use cases in spatial intention recognition and it is shown that Tree Adjoining Grammars can be used to handle rule-ruleconstraints.

...read moreread less

Abstract: In its most general form, the problem of inferring the intentions of a mobile user from his or her spatial behavior is equivalent to the plan recognition problem which is known to be intractable. Tractable special cases of the problem are therefore of great practical interest. Using formal grammars, intention recognition problems can be stated as parsing problems in a way that makes the connection between expressiveness and complexity explicit. We argue that context free grammars are not sufficiently expressive to handle important use cases. Furthermore, we identify three types of constraints on the grammar’s productions that may arise in spatial intention recognition: rule-at-location constraints, rule-rule-constraints, and complex rule-location-constraints. Finally we show that Tree Adjoining Grammars can be used to handle rule-ruleconstraints.

...read moreread less

Journal Issue•DOI•

Polymorphic malware detection and identification via context-free grammar homomorphism

[...]

Gerald R. Thompson¹, Lori A. Flynn¹•Institutions (1)

Bell Labs¹

01 Sep 2007-Bell Labs Technical Journal

TL;DR: The technique maps a program's hierarchical structure to a context-free grammar, normalizes the grammar, and uses a fast check for homomorphism between the normalized grammars.

...read moreread less

Abstract: Computer viruses continue to proliferate despite the use of virus detection systems (VDS). This is due to VDS inability to detect variants not represented in signature databases. Detection systems look for contiguous byte sequences, use regular expressions for noncontiguous sequences, or detect initial behavior within a sandbox. Recent research has focused on using control-flow graph isomorphism in detection. These techniques are ineffective at detecting some polymorphs, which change their byte sequences and initial behavior and produce nonisomorphic control-flow graphs. Our approach compares program hierarchical structure. We observed that polymorphic instances are variants of the same program, these variants use the same algorithm, and a program's algorithm determines its hierarchical structure. Our technique maps a program's hierarchical structure to a context-free grammar, normalizes the grammar, and uses a fast check for homomorphism between the normalized grammars. © 2007 Alcatel-Lucent.

...read moreread less

Proceedings Article•DOI•

Speech Recognition Grammar Compilation in Grammatical Framework

[...]

Björn Bringert¹•Institutions (1)

Chalmers University of Technology¹

29 Jun 2007

TL;DR: This paper describes how grammar-based language models for speech recognition systems can be generated from Grammatical Framework (GF) grammars, which enables rapid development of portable, multilingual and easily modifiable speech recognition applications.

...read moreread less

Abstract: This paper describes how grammar-based language models for speech recognition systems can be generated from Grammatical Framework (GF) grammars. Context-free grammars and finite-state models can be generated in several formats: GSL, SRGS, JSGF, and HTK SLF. In addition, semantic interpretation code can be embedded in the generated context-free grammars. This enables rapid development of portable, multilingual and easily modifiable speech recognition applications.

...read moreread less