# Grammatical Inference: Algorithms and Applications: 8th International Colloquium, ICGI 2006, Tokyo, Japan, September 20-22, 2006, Proceedings

Abstract: Invited Papers.- Parsing Without Grammar Rules.- Classification of Biological Sequences with Kernel Methods.- Regular Papers.- Identification in the Limit of Systematic-Noisy Languages.- Ten Open Problems in Grammatical Inference.- Polynomial-Time Identification of an Extension of Very Simple Grammars from Positive Data.- PAC-Learning Unambiguous NTS Languages.- Incremental Learning of Context Free Grammars by Bridging Rule Generation and Search for Semi-optimum Rule Sets.- Variational Bayesian Grammar Induction for Natural Language.- Stochastic Analysis of Lexical and Semantic Enhanced Structural Language Model.- Using Pseudo-stochastic Rational Languages in Probabilistic Grammatical Inference.- Learning Analysis by Reduction from Positive Data.- Inferring Grammars for Mildly Context Sensitive Languages in Polynomial-Time.- Planar Languages and Learnability.- A Unified Algorithm for Extending Classes of Languages Identifiable in the Limit from Positive Data.- Protein Motif Prediction by Grammatical Inference.- Grammatical Inference in Practice: A Case Study in the Biomedical Domain.- Inferring Grammar Rules of Programming Language Dialects.- The Tenjinno Machine Translation Competition.- Large Scale Inference of Deterministic Transductions: Tenjinno Problem 1.- A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer.- Learning n-Ary Node Selecting Tree Transducers from Completely Annotated Examples.- Learning Multiplicity Tree Automata.- Learning DFA from Correction and Equivalence Queries.- Using MDL for Grammar Induction.- Characteristic Sets for Inferring the Unions of the Tree Pattern Languages by the Most Fitting Hypotheses.- Learning Deterministic DEC Grammars Is Learning Rational Numbers.- Iso-array Acceptors and Learning.- Poster Papers.- A Merging States Algorithm for Inference of RFSAs.- Query-Based Learning of XPath Expressions.- Learning Finite-State Machines from Inexperienced Teachers.- Suprasymbolic Grammar Induction by Recurrent Self-Organizing Maps.- Graph-Based Structural Data Mining in Cognitive Pattern Interpretation.- Constructing Song Syntax by Automata Induction.- Learning Reversible Languages with Terminal Distinguishability.- Grammatical Inference for Syntax-Based Statistical Machine Translation.

TL;DR: The analysis shows that the type of grammars induced by the algorithm can potentially outperform the state-of-the-art in unsupervised parsing on the WSJ10 corpus and are, in theory, capable of modelling context-free features of natural language syntax.

Abstract: Recently, different theoretical learning results have been found for a variety of contextfree grammar subclasses through the use of distributional learning [1]. However, these results are still not extended to probabilistic grammars. In this work, we give a practical algorithm, with some proven properties, that learns a subclass of probabilistic grammars from positive data. A minimum satisfiability solver is used to direct the search towards small grammars. Experiments on well-known context-free languages and artificial natural language grammars give positive results. Moreover, our analysis shows that the type of grammars induced by our algorithm are, in theory, capable of modelling context-free features of natural language syntax. One of our experiments shows that our algorithm can potentially outperform the state-of-the-art in unsupervised parsing on the WSJ10 corpus.

21 Jul 2009TL;DR: Grammatical inference and grammar induction both seem to indicate that techniques aiming at building grammatical formalisms when given some information about a language are not concerned with automata or other finite state machines, but this is far from true, and many of the more important results in grammatical inference rely heavily on automata formalisms, and particularly on the specific use of determinism that is made.

Abstract: The terms grammatical inference and grammar induction both seem to indicate that techniques aiming at building grammatical formalisms when given some information about a language are not concerned with automata or other finite state machines. This is far from true, and many of the more important results in grammatical inference rely heavily on automata formalisms, and particularly on the specific use of determinism that is made. We survey here some of the main ideas and results in the field.

TL;DR: In this paper, the existence of a canonical form for semi-deterministic transducers with sets of pairwise incomparable output strings is proved. But this form requires domain knowledge only and there is no learning algorithm that uses only domain knowledge.

Abstract: We prove the existence of a canonical form for semi-deterministic transducers with sets of pairwise incomparable output strings. Based on this, we develop an algorithm which learns semi-deterministic transducers given access to translation queries. We also prove that there is no learning algorithm for semi-deterministic transducers that uses only domain knowledge.

01 Jan 2016TL;DR: By controlling better the information to which one has access, this setting provides a better understanding of the hardness of learning tasks, and allows us to solve practical learning situations, for which new algorithms are needed.

Abstract: When learning languages or grammars, an attractive alternative to using a large corpus is to learn by interacting with the environment. This can allow us to deal with situations where data is scarce or expensive, but testing or experimenting is possible. The situation, which arises in a number of fields, is formalised in a setting called active learning or query learning. By controlling better the information to which one has access, this setting provides us with a better understanding of the hardness of learning tasks. But the setting also allows us to solve practical learning situations, for which new algorithms are needed.

^{1}TL;DR: From a theoretical perspective, the main aim of this work is to study the learnability of different types of grammars from differenttypes of data and propose algorithms for learning.

Abstract: Grammarsareapowerfulrepresentationofsequentialpatternsofvarioustypes,rangingfromspeech and other audio signals, to biological sequences and user navigation on the Web. Thelong study of grammars, especially in natural languages, has resulted in a rich repertoire ofgrammar variants and ﬂavors that provide different representation power and consequentlydifferent complexity of analysis. Beyond the issues of representation complexity, most ofthe work on grammars has focused on their use by parsing programs that analyze stringsof a given “language”, in order to prove their grammaticalness and/or identify interestingelements of the language.The use of grammars by parsers, especially the more expressive and complex grammars,remains a challenging and interesting research issue on its own. However, grammatical in-ference goes a step further to study methods that learn grammars from data. Grammaticalinference is an established research ﬁeld in Artiﬁcial Intelligence, dating back to the 60s andhasbeenextensivelyaddressedbyresearchersinautomatatheory,languageacquisition,com-putational linguistics, machine learning, pattern recognition, computational learning theoryand neural networks. From a theoretical perspective, the main aim of this work is to studythe learnability of different types of grammars from different types of data and propose efﬁ-cient algorithms for learning. In parallel, a signiﬁcant amount of work focuses on innovativeapplications of grammatical inference algorithms to various knowledge discovery tasks.The main forum for presenting this type of work in the past 15 years has been the Interna-tional Colloquium on Grammatical Inference (ICGI) which takes place in different countriesand different continents every two years. The seventh ICGI was held in the National Centrefor Scientiﬁc Research “Demokritos”, Greece on October 11–13th, 2004. The topics of the

