About: Grammar induction is a research topic. Over the lifetime, 1178 publications have been published within this topic receiving 30835 citations.
Papers published on a yearly basis
TL;DR: It was found that theclass of context-sensitive languages is learnable from an informant, but that not even the class of regular languages is learningable from a text.
Abstract: Language learnability has been investigated. This refers to the following situation: A class of possible languages is specified, together with a method of presenting information to the learner about an unknown language, which is to be chosen from the class. The question is now asked, “Is the information sufficient to determine which of the possible languages is the unknown language?” Many definitions of learnability are possible, but only the following is considered here: Time is quantized and has a finite starting time. At each time the learner receives a unit of information and is to make a guess as to the identity of the unknown language on the basis of the information received so far. This process continues forever. The class of languages will be considered learnable with respect to the specified method of information presentation if there is an algorithm that the learner can use to make his guesses, the algorithm having the following property: Given any language of the class, there is some finite time after which the guesses will all be the same and they will be correct. In this preliminary investigation, a language is taken to be a set of strings on some finite alphabet. The alphabet is the same for all languages of the class. Several variations of each of the following two basic methods of information presentation are investigated: A text for a language generates the strings of the language in any order such that every string of the language occurs at least once. An informant for a language tells whether a string is in the language, and chooses the strings in some order such that every string occurs at least once. It was found that the class of context-sensitive languages is learnable from an informant, but that not even the class of regular languages is learnable from a text.
14 Apr 1983
TL;DR: An algorithm that can fix a bug that has been identified, and integrate it with the diagnosis algorithms to form an interactive debugging system that can debug programs that are too complex for the Model Inference System to synthesize.
Abstract: The thesis lays a theoretical framework for program debugging, with the goal of partly mechanizing this activity. In particular, we formalize and develop algorithmic solutions to the following two questions: (1) How do we identify a bug in a program that behaves incorrectly? (2) How do we fix a bug, once one is identified? We develop interactive diagnosis algorithms that identify a bug in a program that behaves incorrectly, and implement them in Prolog for the diagnosis of Prolog programs. Their performance suggests that they can be the backbone of debugging aids that go far beyond what is offered by current programming environments. We develop an inductive inference algorithm that synthesizes logic programs from examples of their behavior. The algorithm incorporates the diagnosis algorithms as a component. It is incremental, and progresses by debugging a program with respect to the examples. The Model Inference System is a Prolog implementation of the algorithm. Its range of applications and efficiency is comparable to existing systems for program synthesis from examples and grammatical inference. We develop an algorithm that can fix a bug that has been identified, and integrate it with the diagnosis algorithms to form an interactive debugging system. By restricting the class of bugs we attempt to correct, the system can debug programs that are too complex for the Model Inference System to synthesize.
01 Jan 1994
TL;DR: In this article, Charniak presents statistical language processing from an artificial intelligence point of view in a text for researchers and scientists with a traditional computer science background, which is grounded in real text and therefore promises to produce usable results.
Abstract: From the Publisher: Eugene Charniak breaks new ground in artificial intelligence research by presenting statistical language processing from an artificial intelligence point of view in a text for researchers and scientists with a traditional computer science background. New, exacting empirical methods are needed to break the deadlock in such areas of artificial intelligence as robotics, knowledge representation, machine learning, machine translation, and natural language processing (NLP). It is time, Charniak observes, to switch paradigms. This text introduces statistical language processing techniques -- word tagging, parsing with probabilistic context free grammars, grammar induction, syntactic disambiguation, semantic word classes, word-sense disambiguation -- along with the underlying mathematics and chapter exercises. Charniak points out that as a method of attacking NLP problems, the statistical approach has several advantages. It is grounded in real text and therefore promises to produce usable results, and it offers an obvious way to approach learning: "one simply gathers statistics." Language, Speech, and Communication
01 Jan 1991
TL;DR: This chapter discusses supervised learning using Parametric and Nonparametric Approaches and unsupervised Learning in NeurPR, and discusses feedforward Networks and Training by Backpropagation.
Abstract: STATISTICAL PATTERN RECOGNITION (StatPR). Supervised Learning (Training) Using Parametric and Nonparametric Approaches. Linear Discriminant Functions and the Discrete and Binary Feature Cases. Unsupervised Learning and Clustering. SYNTACTIC PATTERN RECOGNITION (SyntPR). Overview. Syntactic Recognition via Parsing and Other Grammars. Graphical Approaches to SyntPR. Learning via Grammatical Inference. NEURAL PATTERN RECOGNITION (NeurPR). Introduction to Neural Networks. Introduction to Neural Pattern Associators and Matrix Approaches. Feedforward Networks and Training by Backpropagation. Content Addressable Memory Approaches and Unsupervised Learning in NeurPR. Appendices. References. Permission Source Notes. Index.
TL;DR: This paper proposed a tagset that consists of twelve universal part-of-speech categories and developed a mapping from 25 different treebank tagsets to this universal set, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts of speech for 22 different languages.
Abstract: To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via two experiments, including one that reports competitive accuracies for unsupervised grammar induction without gold standard part-of-speech tags.
Related Topics (5)