scispace - formally typeset
Open Access

Learning Stochastic Context-Free Grammars from Corpora Using a Genetic Algorithm.

Rudi Lutz
TLDR
A genetic algorithm for learning stochastic context-free grammars from finite language samples as described as well as a number of experiments in learning Grammars for a range of formal languages.
Abstract
A genetic algorithm for learning stochastic context-free grammars from finite language samples as described. Solutions to the inference problem are evolved by optimising the parameters of a covering grammar for a given language sample. We describe a number of experiments in learning grammars for a range of formal languages. The results of these experiments are encouraging and compare favourably with other approaches to stochastic grammatical inference.

read more

Citations
More filters

Natural language parsing with graded constraints

TL;DR: A large set of experimental results confirms that WCDGs are well-suited for handling gradation in natural language: the inherent robustness and the availability of characterizations of constraint conflicts make the WCDG parser a suitable candidate for a diagnosis component in applications for computer-assisted language learning.

Evolving stochastic context-free grammars from examples using a minimum description length principle

TL;DR: An evolutionary approach to the problem of inferring stochastic context-free grammars from finite language samples is described, using a genetic algorithm, with a fitness function derived from a minimum description length principle.
Book ChapterDOI

Text Mining and Information Extraction

TL;DR: This chapter defines text mining and describes the three main approaches for performing information extraction, and describes how to visually display and analyze the outcome of the information extraction process.
Book ChapterDOI

Playing a toy-grammar with GCS

TL;DR: Discovering component of the GCS and fitness function were modified and applied for inferring a toy-grammar, a tiny natural language grammar expressed in Chomsky Normal Form, and proved that proposed rule's fertility improves performance ofThe GCS considerably.

One in the bush Low-density language technology

Lars Borin
TL;DR: An analogy between the proverb “A bird in the hand is worth two in the bush” and the differences between knowledge-rich – based on hand-crafted grammars – and knowledge-poor –based on machine-learning – methods for building language resources is given.
References
More filters
Journal ArticleDOI

Paper: Modeling by shortest data description

Jorma Rissanen
- 01 Sep 1978 - 
TL;DR: The number of digits it takes to write down an observed sequence x1,...,xN of a time series depends on the model with its parameters that one assumes to have generated the observed data.
Journal ArticleDOI

Inductive Inference: Theory and Methods

TL;DR: This survey highlights and explains the main ideas that have been developed in the study of inductive inference, with special emphasis on the relations between the general theory and the specific algorithms and implementations.
Journal ArticleDOI

Trainable grammars for speech recognition

TL;DR: This paper presents a generalization of these algorithms to certain denumerable‐state, hidden Markov processes that permits automatic training of the stochastic analog of an arbitrary context free grammar.

Context free grammar induction using genetic algorithms

TL;DR: A genetic algorithm was developed for the purpose of inferring context free grammars and various forms of the grammar to generate the language of correctly balanced and nested brackets were successfully inferred.
Proceedings ArticleDOI

Regular language induction with genetic programming

TL;DR: The contribution of this research is the effective translation of DFAs to S-expressions, the application of renumbering, and of editing to the problem of language induction to assure that all states in a DFA are reachable from the start state.