scispace - formally typeset
Journal ArticleDOI

A survey of grammatical inference methods for natural language learning

Arianna D'Ulizia, +2 more
- 01 Jun 2011 - 
- Vol. 36, Iss: 1, pp 1-27
Reads0
Chats0
TLDR
A survey of the methodologies for inferring context-free grammars from examples, developed by researchers in the last decade, to provide a reader with introduction to major concepts and current approaches in Natural Language Learning research.
Abstract
The high complexity of natural language and the huge amount of human and temporal resources necessary for producing the grammars lead several researchers in the area of Natural Language Processing to investigate various solutions for automating grammar generation and updating processes. Many algorithms for Context-Free Grammar inference have been developed in the literature. This paper provides a survey of the methodologies for inferring context-free grammars from examples, developed by researchers in the last decade. After introducing some preliminary definitions and notations concerning learning and inductive inference, some of the most relevant existing grammatical inference methods for Natural Language are described and classified according to the kind of presentation (if text or informant) and the type of information (if supervised, unsupervised, or semi-supervised). Moreover, the state of the art of the strategies for evaluation and comparison of different grammar inference methods is presented. The goal of the paper is to provide a reader with introduction to major concepts and current approaches in Natural Language Learning research.

read more

Citations
More filters
Journal ArticleDOI

Mixture of experts: a literature survey

TL;DR: A categorisation of the ME literature based on the implicit problem space partitioning using a tacit competitive process between the experts is presented, and the first group is called the mixture of implicitly localised experts (MILE), and the second is called mixture of explicitly localised Experts (MELE), as it uses pre-specified clusters.
Journal ArticleDOI

Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner.

TL;DR: It is argued that, on the computational side, it is important to move from toy problems to the full complexity of the learning situation, and take as input as faithful reconstructions of the sensory signals available to infants as possible.
Dissertation

Unsupervised learning for text-to-speech synthesis

Oliver Watts
TL;DR: The distributional analysis proposed here places the textual objects analysed in a continuous-valued space, rather than specifying a hard categorisation of those objects, so that the models generalise over objects’ surface forms in a way that is acoustically relevant.
Journal ArticleDOI

Learning Grammars for Architecture-Specific Facade Parsing

TL;DR: Experimental validation and comparison with the state-of-the-art grammar-based methods on four different datasets show that the learned grammar helps in much faster convergence while producing equal or more accurate parsing results compared to handcrafted grammarmars as well as grammars learned by other methods.
Journal ArticleDOI

Automatic Learning of Linguistic Resources for Stopword Removal and Stemming from Text

TL;DR: This paper proposes a methodology to automatically learn linguistic resources for a natural language starting from texts written in that language, and experimental results show that its application may effectively provide useful linguistic resources in a fully automatic manner.
References
More filters
Book

Introduction to Automata Theory, Languages, and Computation

TL;DR: This book is a rigorous exposition of formal languages and models of computation, with an introduction to computational complexity, appropriate for upper-level computer science undergraduates who are comfortable with mathematical arguments.
ReportDOI

Building a large annotated corpus of English: the penn treebank

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Proceedings ArticleDOI

Combining labeled and unlabeled data with co-training

TL;DR: A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.
Journal ArticleDOI

The CHILDES Project: Tools for Analyzing Talk

Clifton Pye, +1 more
- 01 Mar 1994 - 
TL;DR: This book describes three basic tools for language analysis of transcript data by computer that have been developed in the context of the "Child Language Data Exchange System (CHILDES)" project, and focuses on their use in the child language field, believing that researchers from other areas can make the necessary analogies to their own topics.