scispace - formally typeset
Search or ask a question

Showing papers by "Walter Daelemans published in 1998"


Posted Content
TL;DR: This paper showed that editing exceptional training instances (with low typicality or low class prediction strength) tends to harm generalization accuracy and that the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness).
Abstract: We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.

195 citations


Proceedings ArticleDOI
10 Aug 1998
TL;DR: How the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system is examined.
Abstract: In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generator (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second stage classifiers. All combination taggers outperform their best component, with the best combination showing a 19.1% lower error rate than the best indvidual tagger.

134 citations



01 Jan 1998
TL;DR: This work demonstrates that the three modules trained with MBL display high generalization accuracy, and argues why MBL is applicable similarly well to a large class of other NLP tasks.
Abstract: The need for software modules performing natural language processing (NLP) tasks is growing. These modules should perform efficiently and accurately, while at the same time rapid development is often mandatory. Recent work has indicated that machine learning techniques in general, and memory-based learning (MBL) in particular, offer the tools to meet both ends. We present examples of modules trained with MBL on three NLP tasks: (i) text-to-speech conversion, (ii) part-of-speech tagging, and (iii) phrase chunking. We demonstrate that the three modules display high generalization accuracy, and argue why MBL is applicable similarly well to a large class of other NLP tasks.

22 citations


Proceedings ArticleDOI
11 Jan 1998
TL;DR: It is concluded that keeping full memory of types rather than tokens, and excluding minority ambiguities appear to be the only performance-preserving optimisations of memory-based learning.
Abstract: Memory-based learning, keeping full memory of learning material, appears a viable approach to learning NLP tasks, and is often superior in generalisation accuracy to eager learning approaches that abstract from learning material. Here we investigate three partial memory-based learning approaches which remove from memory specific task instance types estimated to be exceptional. The three approaches each implement one heuristic function for estimating exceptionality of instance types: (i) typicality, (ii) class prediction strength, and (iii) friendly-neighbourhood size. Experiments are performed with the memory-based learning algorithm IB1-IG trained on English word pronunciation. We find that removing instance types with low prediction strength (ii) is the only tested method which does not seriously harm generalisation accuracy. We conclude that keeping full memory of types rather than tokens, and excluding minority ambiguities appear to be the only performance-preserving optimisations of memory-based learning.

22 citations


Posted Content
TL;DR: This paper examined how the differences in modelling between different data-driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system by means of an experiment involving the task of morpho-syntactic word class tagging.
Abstract: In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generators (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second stage classifiers. All combination taggers outperform their best component, with the best combination showing a 19.1% lower error rate than the best individual tagger.

13 citations


Posted Content
TL;DR: This paper used IGTree, an inductive learning decision-tree algorithm, to train and test three word-pronunciation systems in which the number of abstraction levels (implemented as sequenced modules) is reduced from five, via three, to one.
Abstract: In leading morpho-phonological theories and state-of-the-art text-to-speech systems it is assumed that word pronunciation cannot be learned or performed without in-between analyses at several abstraction levels (e.g., morphological, graphemic, phonemic, syllabic, and stress levels). We challenge this assumption for the case of English word pronunciation. Using IGTree, an inductive-learning decision-tree algorithms, we train and test three word-pronunciation systems in which the number of abstraction levels (implemented as sequenced modules) is reduced from five, via three, to one. The latter system, classifying letter strings directly as mapping to phonemes with stress markers, yields significantly better generalisation accuracies than the two multi-module systems. Analyses of empirical results indicate that positive utility effects of sequencing modules are outweighed by cascading errors passed on between modules.

11 citations


Proceedings ArticleDOI
11 Jan 1998
TL;DR: This work trains and test three word-pronunciation systems in which the number of abstraction levels (implemented as sequenced modules) is reduced from five, via three, to one, andalyses of empirical results indicate that positive utility effects of sequencing modules are outweighed by cascading errors passed on between modules.
Abstract: In leading morpho-phonological theories and state-of-the-art text-to-speech systems it is assumed that word pronunciation cannot be learned or performed without in-between analyses at several abstraction levels (e.g., morphological, graphemic, phonemic, syllabic, and stress levels). We challenge this assumption for the case of English word pronunciation. Using igtree, an inductive-learning decision-tree algorithms, we train and test three word-pronunciation systems in which the number of abstraction levels (implemented as sequenced modules) is reduced from five, via three, to one. The latter system, classifying letter strings directly as mapping to phonemes with stress markers, yields significantly better generalisation accuracies than the two multi-module systems. Analyses of empirical results indicate that positive utility effects of sequencing modules are outweighed by cascading errors passed on between modules.

8 citations



Proceedings ArticleDOI
11 Jan 1998
TL;DR: It is concluded that forgetting, either by abstracting from the training data or by editing exceptional training items in lazy learning is harmful to generalization accuracy, and will attempt to provide an explanation for these unexpected results.
Abstract: The usual approach to learning language processing tasks such as tagging, parsing, grapheme-to-phoneme conversion, pp-attachment, etc., is to extract regularities from training data in the form of decision trees, rules, probabilities or other abstractions. These representations of regularities are then used to solve new cases of the task. The individual training examples on which the abstractions were based are discarded (forgotten). While this approach seems to work well for other application areas of Machine Learning, I will show that there is evidence that it is not the best way to learn language processing tasks. I will briefly review empirical work in our groups in Antwerp and Tilburg on lazy language learning. In this approach (also called, instance-based, case-based, memory-based, and example-based learning), generalization happens at processing time by means of extrapolation from the most similar items in memory to the new item being processed. Lazy Learning with a simple similarity metric based on information entropy (IB1-IG, Daelemans & van den Bosch, 1992, 1997) consistently outperforms abstracting (greedy) learning techniques such as C5.0 or backprop learning on a broad selection of natural language processing tasks ranging from phonology to semantics. Our intuitive explanation for this result is that lazy learning techniques keep all training items, whereas greedy approaches lose useful information by forgetting low-frequency or exceptional instances of the task, not covered by the extracted rules or models (Daelemans, 1996). Apart from the empirical work in Tilburg and Antwerp, a number of recent studies on statistical natural language processing (e.g. Dagan & Lee, 1997; Collins & Brooks, 1995) also suggest that, contrary to common wisdom, forgetting specific training items, even when they represent extremely low-frequency events, is harmful to generalization accuracy. After reviewing this empirical work briefly, I will report on new results (work in progress in collaboration with van den Bosch and Zavrel), systematically comparing greedy and lazy learning techniques on a number of benchmark natural language processing tasks: tagging, grapheme-to-phoneme conversion, and pp-attachment. The results show that forgetting individual training items, however 'improbable' they may be, is indeed harmful. Furthermore, they show that combining lazy learning with training set editing techniques (based on typicality and other regularity criteria) also leads to worse generalization results. I will conclude that forgetting, either by abstracting from the training data or by editing exceptional training items in lazy learning is harmful to generalization accuracy, and will attempt to provide an explanation for these unexpected results.

4 citations


01 Jan 1998
TL;DR: The general methodology for the construction of inductive lexicons is introduced, empirical results on a case study using the approach are discussed, and prediction of the gender of nouns in Dutch is discussed.
Abstract: Machine learning techniques can be used to make lexicons adaptive. The main problems in adaptation are the addition of lexical material to an existing lexical database, and the recomputation of sublanguage-dependent lexical information when porting the lexicon to a new domain or application. Inductive lexicons combine available lexical information and corpus data to alleviate these tasks. In this paper, we introduce the general methodology for the construction of inductive lexicons, and discuss empirical results on a case study using the approach: prediction of the gender of nouns in Dutch.

Posted Content
TL;DR: This article investigated three partial memory-based learning approaches which remove from memory specific task instance types estimated to be exceptional, i.e., typicality, class prediction strength, and friendly-neighbourhood size.
Abstract: Memory-based learning, keeping full memory of learning material, appears a viable approach to learning NLP tasks, and is often superior in generalisation accuracy to eager learning approaches that abstract from learning material. Here we investigate three partial memory-based learning approaches which remove from memory specific task instance types estimated to be exceptional. The three approaches each implement one heuristic function for estimating exceptionality of instance types: (i) typicality, (ii) class prediction strength, and (iii) friendly-neighbourhood size. Experiments are performed with the memory-based learning algorithm IB1-IG trained on English word pronunciation. We find that removing instance types with low prediction strength (ii) is the only tested method which does not seriously harm generalisation accuracy. We conclude that keeping full memory of types rather than tokens, and excluding minority ambiguities appear to be the only performance-preserving optimisations of memory-based learning.