Showing papers by "Walter Daelemans published in 1998"

PDF

Open Access

Posted Content•

Forgetting Exceptions is Harmful in Language Learning

[...]

Walter Daelemans¹, Antal van den Bosch¹, Jakub Zavrel¹•Institutions (1)

22 Dec 1998-arXiv: Computation and Language

TL;DR: This paper showed that editing exceptional training instances (with low typicality or low class prediction strength) tends to harm generalization accuracy and that the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness).

...read moreread less

Abstract: We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.

...read moreread less

195 citations

Proceedings Article•DOI•

Improving Data Driven Wordclass Tagging by System Combination

[...]

Hans van Halteren¹, Jakub Zavrel², Walter Daelemans²•Institutions (2)

Radboud University Nijmegen¹, Tilburg University²

10 Aug 1998

TL;DR: How the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system is examined.

...read moreread less

Abstract: In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generator (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second stage classifiers. All combination taggers outperform their best component, with the best combination showing a 19.1% lower error rate than the best indvidual tagger.

...read moreread less

134 citations

TIMBL : Tilburg Memory-Based Learner. Version 1.0, Reference Guide

[...]

Walter Daelemans, Jakub Zavrel, K. van der Sloot, A. van den Bosch

01 Jan 1998

33 citations

Rapid Development of NLP Modules with Memory-based Learning

[...]

Walter Daelemans¹, A. van den Bosch, Jakub Zavrel, Jorn Veenstra, Sabine Buchholz, G.J. Busser - Show less +2 more•Institutions (1)

Tilburg University¹

01 Jan 1998

TL;DR: This work demonstrates that the three modules trained with MBL display high generalization accuracy, and argues why MBL is applicable similarly well to a large class of other NLP tasks.

...read moreread less

Abstract: The need for software modules performing natural language processing (NLP) tasks is growing. These modules should perform efficiently and accurately, while at the same time rapid development is often mandatory. Recent work has indicated that machine learning techniques in general, and memory-based learning (MBL) in particular, offer the tools to meet both ends. We present examples of modules trained with MBL on three NLP tasks: (i) text-to-speech conversion, (ii) part-of-speech tagging, and (iii) phrase chunking. We demonstrate that the three modules display high generalization accuracy, and argue why MBL is applicable similarly well to a large class of other NLP tasks.

...read moreread less

22 citations

Proceedings Article•DOI•

Do not forget: full memory in memory-based learning of word pronunciation

[...]

Antal van den Bosch¹, Walter Daelemans¹•Institutions (1)

Tilburg University¹

11 Jan 1998

TL;DR: It is concluded that keeping full memory of types rather than tokens, and excluding minority ambiguities appear to be the only performance-preserving optimisations of memory-based learning.

...read moreread less

Abstract: Memory-based learning, keeping full memory of learning material, appears a viable approach to learning NLP tasks, and is often superior in generalisation accuracy to eager learning approaches that abstract from learning material. Here we investigate three partial memory-based learning approaches which remove from memory specific task instance types estimated to be exceptional. The three approaches each implement one heuristic function for estimating exceptionality of instance types: (i) typicality, (ii) class prediction strength, and (iii) friendly-neighbourhood size. Experiments are performed with the memory-based learning algorithm IB1-IG trained on English word pronunciation. We find that removing instance types with low prediction strength (ii) is the only tested method which does not seriously harm generalisation accuracy. We conclude that keeping full memory of types rather than tokens, and excluding minority ambiguities appear to be the only performance-preserving optimisations of memory-based learning.

...read moreread less

22 citations

Posted Content•

Improving Data Driven Wordclass Tagging by System Combination

[...]

Hans van Halteren¹, Jakub Zavrel², Walter Daelemans²•Institutions (2)

Radboud University Nijmegen¹, Tilburg University²

31 Jul 1998-arXiv: Computation and Language

TL;DR: This paper examined how the differences in modelling between different data-driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system by means of an experiment involving the task of morpho-syntactic word class tagging.

...read moreread less

Abstract: In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generators (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second stage classifiers. All combination taggers outperform their best component, with the best combination showing a 19.1% lower error rate than the best individual tagger.

...read moreread less

13 citations

Posted Content•

Modularity in inductively-learned word pronunciation systems

[...]

Antal van den Bosch¹, Ton Weijters², Walter Daelemans¹•Institutions (2)

Tilburg University¹, Eindhoven University of Technology²

26 Jan 1998-arXiv: Computation and Language

TL;DR: This paper used IGTree, an inductive learning decision-tree algorithm, to train and test three word-pronunciation systems in which the number of abstraction levels (implemented as sequenced modules) is reduced from five, via three, to one.

...read moreread less

Abstract: In leading morpho-phonological theories and state-of-the-art text-to-speech systems it is assumed that word pronunciation cannot be learned or performed without in-between analyses at several abstraction levels (e.g., morphological, graphemic, phonemic, syllabic, and stress levels). We challenge this assumption for the case of English word pronunciation. Using IGTree, an inductive-learning decision-tree algorithms, we train and test three word-pronunciation systems in which the number of abstraction levels (implemented as sequenced modules) is reduced from five, via three, to one. The latter system, classifying letter strings directly as mapping to phonemes with stress markers, yields significantly better generalisation accuracies than the two multi-module systems. Analyses of empirical results indicate that positive utility effects of sequencing modules are outweighed by cascading errors passed on between modules.

...read moreread less

11 citations

Proceedings Article•DOI•

Modularity in inductively-learned word pronunciation systems

[...]

Antal van den Bosch¹, Ton Weijters², Walter Daelemans¹•Institutions (2)

Tilburg University¹, Eindhoven University of Technology²

11 Jan 1998

TL;DR: This work trains and test three word-pronunciation systems in which the number of abstraction levels (implemented as sequenced modules) is reduced from five, via three, to one, andalyses of empirical results indicate that positive utility effects of sequencing modules are outweighed by cascading errors passed on between modules.

...read moreread less

Abstract: In leading morpho-phonological theories and state-of-the-art text-to-speech systems it is assumed that word pronunciation cannot be learned or performed without in-between analyses at several abstraction levels (e.g., morphological, graphemic, phonemic, syllabic, and stress levels). We challenge this assumption for the case of English word pronunciation. Using igtree, an inductive-learning decision-tree algorithms, we train and test three word-pronunciation systems in which the number of abstraction levels (implemented as sequenced modules) is reduced from five, via three, to one. The latter system, classifying letter strings directly as mapping to phonemes with stress markers, yields significantly better generalisation accuracies than the two multi-module systems. Analyses of empirical results indicate that positive utility effects of sequencing modules are outweighed by cascading errors passed on between modules.

...read moreread less

8 citations

Toward an exemplar-based computational model for cognitive grammar

[...]

Walter Daelemans

01 Jan 1998

5 citations

Proceedings Article•DOI•

Abstraction is harmful in language learning

[...]

Walter Daelemans¹•Institutions (1)

University of Antwerp¹

11 Jan 1998

TL;DR: It is concluded that forgetting, either by abstracting from the training data or by editing exceptional training items in lazy learning is harmful to generalization accuracy, and will attempt to provide an explanation for these unexpected results.

...read moreread less

Abstract: The usual approach to learning language processing tasks such as tagging, parsing, grapheme-to-phoneme conversion, pp-attachment, etc., is to extract regularities from training data in the form of decision trees, rules, probabilities or other abstractions. These representations of regularities are then used to solve new cases of the task. The individual training examples on which the abstractions were based are discarded (forgotten). While this approach seems to work well for other application areas of Machine Learning, I will show that there is evidence that it is not the best way to learn language processing tasks. I will briefly review empirical work in our groups in Antwerp and Tilburg on lazy language learning. In this approach (also called, instance-based, case-based, memory-based, and example-based learning), generalization happens at processing time by means of extrapolation from the most similar items in memory to the new item being processed. Lazy Learning with a simple similarity metric based on information entropy (IB1-IG, Daelemans & van den Bosch, 1992, 1997) consistently outperforms abstracting (greedy) learning techniques such as C5.0 or backprop learning on a broad selection of natural language processing tasks ranging from phonology to semantics. Our intuitive explanation for this result is that lazy learning techniques keep all training items, whereas greedy approaches lose useful information by forgetting low-frequency or exceptional instances of the task, not covered by the extracted rules or models (Daelemans, 1996). Apart from the empirical work in Tilburg and Antwerp, a number of recent studies on statistical natural language processing (e.g. Dagan & Lee, 1997; Collins & Brooks, 1995) also suggest that, contrary to common wisdom, forgetting specific training items, even when they represent extremely low-frequency events, is harmful to generalization accuracy. After reviewing this empirical work briefly, I will report on new results (work in progress in collaboration with van den Bosch and Zavrel), systematically comparing greedy and lazy learning techniques on a number of benchmark natural language processing tasks: tagging, grapheme-to-phoneme conversion, and pp-attachment. The results show that forgetting individual training items, however 'improbable' they may be, is indeed harmful. Furthermore, they show that combining lazy learning with training set editing techniques (based on typicality and other regularity criteria) also leads to worse generalization results. I will conclude that forgetting, either by abstracting from the training data or by editing exceptional training items in lazy learning is harmful to generalization accuracy, and will attempt to provide an explanation for these unexpected results.

...read moreread less

4 citations

Toward inductive lexicons: A case study

[...]

Walter Daelemans¹, G. Durieux¹, A. van den Bosch²•Institutions (2)

Tilburg University¹, University of Antwerp²

01 Jan 1998

TL;DR: The general methodology for the construction of inductive lexicons is introduced, empirical results on a case study using the approach are discussed, and prediction of the gender of nouns in Dutch is discussed.

...read moreread less

Abstract: Machine learning techniques can be used to make lexicons adaptive. The main problems in adaptation are the addition of lexical material to an existing lexical database, and the recomputation of sublanguage-dependent lexical information when porting the lexicon to a new domain or application. Inductive lexicons combine available lexical information and corpus data to alleviate these tasks. In this paper, we introduce the general methodology for the construction of inductive lexicons, and discuss empirical results on a case study using the approach: prediction of the gender of nouns in Dutch.

...read moreread less

Posted Content•

Do not forget: Full memory in memory-based learning of word pronunciation

[...]

Antal van den Bosch¹, Walter Daelemans¹•Institutions (1)

Tilburg University¹

26 Jan 1998-arXiv: Computation and Language

TL;DR: This article investigated three partial memory-based learning approaches which remove from memory specific task instance types estimated to be exceptional, i.e., typicality, class prediction strength, and friendly-neighbourhood size.

...read moreread less