scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Grammatical Inference of PCFGs Applied to Language Modelling and Unsupervised Parsing

01 Jan 2016-Fundamenta Informaticae (IOS Press)-Vol. 146, Iss: 4, pp 379-402
TL;DR: The analysis shows that the type of grammars induced by the algorithm can potentially outperform the state-of-the-art in unsupervised parsing on the WSJ10 corpus and are, in theory, capable of modelling context-free features of natural language syntax.
Abstract: Recently, different theoretical learning results have been found for a variety of contextfree grammar subclasses through the use of distributional learning [1]. However, these results are still not extended to probabilistic grammars. In this work, we give a practical algorithm, with some proven properties, that learns a subclass of probabilistic grammars from positive data. A minimum satisfiability solver is used to direct the search towards small grammars. Experiments on well-known context-free languages and artificial natural language grammars give positive results. Moreover, our analysis shows that the type of grammars induced by our algorithm are, in theory, capable of modelling context-free features of natural language syntax. One of our experiments shows that our algorithm can potentially outperform the state-of-the-art in unsupervised parsing on the WSJ10 corpus.
Citations
More filters
BookDOI
01 Jan 2008
TL;DR: In this paper, a polynomial algorithm for the induction of Monadic Queries was proposed to learn context-sensitive languages from linear Structural Information (LSTM) using iterative Biclustering.
Abstract: Regular Papers.- Learning Meaning Before Syntax.- Schema-Guided Induction of Monadic Queries.- A Polynomial Algorithm for the Inference of Context Free Languages.- Learning Languages from Bounded Resources: The Case of the DFA and the Balls of Strings.- Relevant Representations for the Inference of Rational Stochastic Tree Languages.- Learning Commutative Regular Languages.- Learning Left-to-Right and Right-to-Left Iterative Languages.- Learning Bounded Unions of Noetherian Closed Set Systems Via Characteristic Sets.- A Learning Algorithm for Multi-dimensional Trees, or: Learning Beyond Context-Freeness.- On Learning Regular Expressions and Patterns Via Membership and Correction Queries.- State-Merging DFA Induction Algorithms with Mandatory Merge Constraints.- Using Multiplicity Automata to Identify Transducer Relations from Membership and Equivalence Queries.- Towards Feasible PAC-Learning of Probabilistic Deterministic Finite Automata.- Learning Context-Sensitive Languages from Linear Structural Information.- Polynomial Time Probabilistic Learning of a Subclass of Linear Languages with Queries.- How to Split Recursive Automata.- A Note on the Relationship between Different Types of Correction Queries.- Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering.- Polynomial Distinguishability of Timed Automata.- Evaluation and Comparison of Inferred Regular Grammars.- Identification in the Limit of k,l-Substitutable Context-Free Languages.- Poster Papers.- Learning Subclasses of Pure Pattern Languages.- Which Came First, the Grammar or the Lexicon?.- Learning Node Label Controlled Graph Grammars (Extended Abstract).- Inference of Uniquely Terminating EML.- Estimating Graph Parameters Using Graph Grammars.- Learning of Regular ?-Tree Languages.- Inducing Regular Languages Using Grammar-Based Classifier System.- Problems with Evaluation of Unsupervised Empirical Grammatical Inference Systems.

8 citations

01 Jan 2011
TL;DR: This book constitutes the refereed proceedings of the 22nd International Conference on Algorithmic Learning Theory, ALT 2011, held in Espoo, Finland, in October 2011, co-located with the 14th International conference on Discovery Science, DS 2011.
Abstract: This book constitutes the refereed proceedings of the 22nd International Conference on Algorithmic Learning Theory, ALT 2011, held in Espoo, Finland, in October 2011, co-located with the 14th International Conference on Discovery Science, DS 2011. The 28 revised full papers presented together with the abstracts of 5 invited talks were carefully reviewed and selected from numerous submissions. The papers are divided into topical sections of papers on inductive inference, regression, bandit problems, online learning, kernel and margin-based methods, intelligent agents and other learning models.

5 citations

Book
23 Nov 2004
TL;DR: A Divide-and-Conquer Approach to Acquire Syntactic Categories and an Analysis of Examples and a Search Space for PAC Learning of Simple Deterministic Languages with Membership Queries.
Abstract: Invited Papers.- Learning and Mathematics.- Learning Finite-State Models for Machine Translation.- The Omphalos Context-Free Grammar Learning Competition.- Regular Papers.- Mutually Compatible and Incompatible Merges for the Search of the Smallest Consistent DFA.- Faster Gradient Descent Training of Hidden Markov Models, Using Individual Learning Rate Adaptation.- Learning Mild Context-Sensitiveness: Toward Understanding Children's Language Learning.- Learnability of Pregroup Grammars.- A Markovian Approach to the Induction of Regular String Distributions.- Learning Node Selecting Tree Transducer from Completely Annotated Examples.- Identifying Clusters from Positive Data.- Introducing Domain and Typing Bias in Automata Inference.- Analogical Equations in Sequences: Definition and Resolution.- Representing Languages by Learnable Rewriting Systems.- A Divide-and-Conquer Approach to Acquire Syntactic Categories.- Grammatical Inference Using Suffix Trees.- Learning Stochastic Finite Automata.- Navigation Pattern Discovery Using Grammatical Inference.- A Corpus-Driven Context-Free Approximation of Head-Driven Phrase Structure Grammar.- Partial Learning Using Link Grammars Data.- eg-GRIDS: Context-Free Grammatical Inference from Positive Examples Using Genetic Search.- The Boisdale Algorithm - An Induction Method for a Subclass of Unification Grammar from Positive Data.- Learning Stochastic Deterministic Regular Languages.- Polynomial Time Identification of Strict Deterministic Restricted One-Counter Automata in Some Class from Positive Data.- Poster Papers.- Learning Syntax from Function Words.- Running FCRPNI in Efficient Time for Piecewise and Right Piecewise Testable Languages.- Extracting Minimum Length Document Type Definitions Is NP-Hard.- Learning Distinguishable Linear Grammars from Positive Data.- Extending Incremental Learning of Context Free Grammars in Synapse.- Identifying Left-Right Deterministic Linear Languages.- Efficient Learning of k-Reversible Context-Free Grammars from Positive Structural Examples.- An Analysis of Examples and a Search Space for PAC Learning of Simple Deterministic Languages with Membership Queries.

4 citations

References
More filters
Journal ArticleDOI
TL;DR: It was found that theclass of context-sensitive languages is learnable from an informant, but that not even the class of regular languages is learningable from a text.
Abstract: Language learnability has been investigated. This refers to the following situation: A class of possible languages is specified, together with a method of presenting information to the learner about an unknown language, which is to be chosen from the class. The question is now asked, “Is the information sufficient to determine which of the possible languages is the unknown language?” Many definitions of learnability are possible, but only the following is considered here: Time is quantized and has a finite starting time. At each time the learner receives a unit of information and is to make a guess as to the identity of the unknown language on the basis of the information received so far. This process continues forever. The class of languages will be considered learnable with respect to the specified method of information presentation if there is an algorithm that the learner can use to make his guesses, the algorithm having the following property: Given any language of the class, there is some finite time after which the guesses will all be the same and they will be correct. In this preliminary investigation, a language is taken to be a set of strings on some finite alphabet. The alphabet is the same for all languages of the class. Several variations of each of the following two basic methods of information presentation are investigated: A text for a language generates the strings of the language in any order such that every string of the language occurs at least once. An informant for a language tells whether a string is in the language, and chooses the strings in some order such that every string occurs at least once. It was found that the class of context-sensitive languages is learnable from an informant, but that not even the class of regular languages is learnable from a text.

3,460 citations

Journal ArticleDOI
Dana Angluin1
TL;DR: In this article, the problem of identifying an unknown regular set from examples of its members and nonmembers is addressed, where the regular set is presented by a minimaMy adequate teacher, which can answer membership queries about the set and can also test a conjecture and indicate whether it is equal to the unknown set and provide a counterexample if not.
Abstract: The problem of identifying an unknown regular set from examples of its members and nonmembers is addressed. It is assumed that the regular set is presented by a minimaMy adequate Teacher, which can answer membership queries about the set and can also test a conjecture and indicate whether it is equal to the unknown set and provide a counterexample if not. (A counterexample is a string in the symmetric difference of the correct set and the conjectured set.) A learning algorithm L* is described that correctly learns any regular set from any minimally adequate Teacher in time polynomial in the number of states of the minimum dfa for the set and the maximum length of any counterexample provided by the Teacher. It is shown that in a stochastic setting the ability of the Teacher to test conjectures may be replaced by a random sampling oracle, EX( ). A polynomial-time learning algorithm is shown for a particular problem of context-free language identification.

2,157 citations

Book
01 Jan 1972
TL;DR: It is the hope that the algorithms and concepts presented in this book will survive the next generation of computers and programming languages, and that at least some of them will be applicable to fields other than compiler writing.
Abstract: From volume 1 Preface (See Front Matter for full Preface) This book is intended for a one or two semester course in compiling theory at the senior or graduate level. It is a theoretically oriented treatment of a practical subject. Our motivation for making it so is threefold. (1) In an area as rapidly changing as Computer Science, sound pedagogy demands that courses emphasize ideas, rather than implementation details. It is our hope that the algorithms and concepts presented in this book will survive the next generation of computers and programming languages, and that at least some of them will be applicable to fields other than compiler writing. (2) Compiler writing has progressed to the point where many portions of a compiler can be isolated and subjected to design optimization. It is important that appropriate mathematical tools be available to the person attempting this optimization. (3) Some of the most useful and most efficient compiler algorithms, e.g. LR(k) parsing, require a good deal of mathematical background for full understanding. We expect, therefore, that a good theoretical background will become essential for the compiler designer. While we have not omitted difficult theorems that are relevant to compiling, we have tried to make the book as readable as possible. Numerous examples are given, each based on a small grammar, rather than on the large grammars encountered in practice. It is hoped that these examples are sufficient to illustrate the basic ideas, even in cases where the theoretical developments are difficult to follow in isolation. From volume 2 Preface (See Front Matter for full Preface) Compiler design is one of the first major areas of systems programming for which a strong theoretical foundation is becoming available. Volume I of The Theory of Parsing, Translation, and Compiling developed the relevant parts of mathematics and language theory for this foundation and developed the principal methods of fast syntactic analysis. Volume II is a continuation of Volume I, but except for Chapters 7 and 8 it is oriented towards the nonsyntactic aspects of compiler design. The treatment of the material in Volume II is much the same as in Volume I, although proofs have become a little more sketchy. We have tried to make the discussion as readable as possible by providing numerous examples, each illustrating one or two concepts. Since the text emphasizes concepts rather than language or machine details, a programming laboratory should accompany a course based on this book, so that a student can develop some facility in applying the concepts discussed to practical problems. The programming exercises appearing at the ends of sections can be used as recommended projects in such a laboratory. Part of the laboratory course should discuss the code to be generated for such programming language constructs as recursion, parameter passing, subroutine linkages, array references, loops, and so forth.

1,727 citations

Journal ArticleDOI
TL;DR: Two applications in speech recognition of the use of stochastic context-free grammars trained automatically via the Inside-Outside Algorithm, used to model VQ encoded speech for isolated word recognition and compared directly to HMMs used for the same task are described.

742 citations


"Grammatical Inference of PCFGs Appl..." refers background or methods in this paper

  • ...Assigning probabilities to such large grammars using expectation maximization does not work in practice due to the large number of local maxima [8, 9]....

    [...]

  • ...The probabilistic equivalent of this is the class of Strongly Congruential PCFGs (SC-PCFGs), defined as all the PCFGs G which, for any non-terminal A, if u ∈ L(A) then L(A) = [u]∼=(L(G),φ) ....

    [...]

  • ...This means that we have to find a CFG for which a probability assignment using the standard EM algorithm for PCFGs [8] yields a stochastic language close to the target one....

    [...]

  • ...Probabilities are assigned to the learned grammar using the standard EM algorithm for PCFGs i.e. the Inside-Outside algorithm [8]....

    [...]

  • ...However, the real test remains that of finding SC-PCFGs that generate good bracketings and good language models....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors deal with directed hypergraphs as a tool to model and solve some classes of problems arising in operations research and in computer science, such as connectivity, paths and cuts.

705 citations