Showing papers by "Walter Daelemans published in 2000"

PDF

Open Access

Proceedings Article•DOI•

Applying system combination to base noun phrase identification

[...]

Erik Tjong Kim Sang¹, Walter Daelemans¹, Hervé Déjean², Rob Koeling, Yuval Krymolowski³, Vasin Punyakanok⁴, Dan Roth⁴ - Show less +3 more•Institutions (4)

University of Antwerp¹, University of Tübingen², Bar-Ilan University³, University of Illinois at Urbana–Champaign⁴

31 Jul 2000

TL;DR: This work uses seven machine learning algorithms for one task: identifying base noun phrases and applies the seven learners with the best combinator, a majority vote of the top five systems, to a standard data set and improves the best published result.

...read moreread less

Abstract: We use seven machine learning algorithms for one task: identifying base noun phrases. The results have been processed by different system combination methods and all of these outperformed the best individual result. We have applied the seven learners with the best combinator, a majority vote of the top five systems, to a standard data set and managed to improve the best published result for this data set.

...read moreread less

50 citations

Journal Article•DOI•

Memory-Based Word Sense Disambiguation

[...]

Jorn Veenstra¹, A. van den Bosch¹, Sabine Buchholz¹, Walter Daelemans¹, Jakub Zavrel¹ - Show less +1 more•Institutions (1)

Tilburg University¹

01 Apr 2000-Computers and The Humanities

TL;DR: In this paper, a memory-based classification architecture for word sense disambiguation is described and its application to the SENSEVAL evaluation task is described. But it does not address the problem of finding the closest match to stored examples of this task.

...read moreread less

Abstract: We describe a memory-based classification architecture for word sense disambiguation and its application to the SENSEVAL evaluation task. For each ambiguous word, a semantic word expert is automatically trained using a memory-based approach. In each expert, selecting the correct sense of a word in a new context is achieved by finding the closest match to stored examples of this task. Advantages of the approach include (i) fast development time for word experts, (ii) easy and elegant automatic integration of information sources, (iii) use of all available data for training the experts, and (iv) relatively high accuracy with minimal linguistic engineering.

...read moreread less

46 citations

Proceedings Article•

Part of speech tagging and lemmatisation for the spoken Dutch corpus

[...]

F Van den Eynde¹, Jakub Zavrel², Walter Daelemans²•Institutions (2)

Katholieke Universiteit Leuven¹, University of Antwerp²

01 Jan 2000

TL;DR: The authors describe lemmatisation and tagging guidelines developed for the "Spoken Dutch Corpus" and lay out the philosophy behind the high granularity tagset that was designed for the project.

...read moreread less

Abstract: This paper describes the lemmatisation and tagging guidelines developed for the “Spoken Dutch Corpus”, and lays out the philosophy behind the high granularity tagset that was designed for the project. To bootstrap the annotation of large quantities of material (10 million words) with this new tagset we tested several existing taggers and tagger generators on initial samples of the corpus. The results show that the most effective method, when trained on the small samples, is a high quality implementation of a Hidden Markov Model tagger generator.

...read moreread less

45 citations

TiMBL : Tilburg memory based learner, version 3.0, Reference guide

[...]

Walter Daelemans, Jakub Zavrel, K. van der Sloot, A. van den Bosch

01 Jan 2000

38 citations

Posted Content•

Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers

[...]

Jakub Zavrel¹, Walter Daelemans¹•Institutions (1)

University of Antwerp¹

13 Jul 2000-arXiv: Computation and Language

TL;DR: Experiments show that COMBI-BOOTSTRAP can integrate a wide variety of existing resources, and achieves much higher accuracy than both the best single tagger and an ensemble tagger constructed out of the same small training sample.

...read moreread less

Abstract: This paper describes a new method, Combi-bootstrap, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. Combi-bootstrap uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that Combi-bootstrap: i) can integrate a wide variety of existing resources, and ii) achieves much higher accuracy (up to 44.7 % error reduction) than both the best single tagger and an ensemble tagger constructed out of the same small training sample.

...read moreread less

24 citations

Proceedings Article•

Bootstrapping a tagged corpus through combination of existing heterogeneous taggers

[...]

Jakub Zavrel¹, Walter Daelemans¹•Institutions (1)

University of Antwerp¹

01 May 2000

TL;DR: This paper used existing taggers and lexical resources for the annotation of corpora with new tagsets using a second level machine learning module that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material.

...read moreread less

Abstract: This paper describes a new method, COMBI-BOOTSTRAP, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. COMBI-BOOTSTRAP uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that COMBI-BOOTSTRAP: i) can integrate a wide variety of existing resources, and ii) achieves much higher accuracy (up to 44.7 % error reduction) than both the best single tagger and an ensemble tagger constructed out of the same small training sample.

...read moreread less

23 citations

Lazy learning : a comparison of natural and machine learning of stress

[...]

Steven Gillis, Walter Daelemans, G. Durieux

01 Jan 2000

14 citations

Proceedings Article•

Meta-Learning for Phonemic Annotation of Corpora

[...]

Veronique Hoste, Walter Daelemans, Erik Tjong Kim Sang, Steven Gillis

29 Jun 2000

TL;DR: It is shown that the application of machine learning methods indeed leads to increased insight into the linguistic regularities determining the variation between the two pronunciation variants studied.

...read moreread less

Abstract: We apply rule induction, classifier combination and meta-learning (stacked classifiers) to the problem of bootstrapping high accuracy automatic annotation of corpora with pronunciation information. The task we address in this paper consists of generating phonemic representations reflecting the Flemish and Dutch pronunciations of a word on the basis of its orthographic representation (which in turn is based on the actual speech recordings). We compare several possible approaches to achieve the text-topronunciation mapping task: memory-based learning, transformation-based learning, rule induction, maximum entropy modeling, combination of classifiers in stacked learning, and stacking of meta-learners. We are interested both in optimal accuracy and in obtaining insight into the linguistic regularities involved. As far as accuracy is concerned, an already high accuracy level (93% for Celex and 86% for Fonilex at word level) for single classifiers is boosted significantly with additional error reductions of 31% and 38% respectively using combination of classifiers, and a further 5% using combination of meta-learners, bringing overall word level accuracy to 96% for the Dutch variant and 92% for the Flemish variant. We also show that the application of machine learning methods indeed leads to increased insight into the linguistic regularities determining the variation between the two pronunciation variants studied.

...read moreread less

12 citations

Simultaneous feature selection and parameter optimization for memory-based natural language processing

[...]

Anne Kool, Jakub Zavrel, Walter Daelemans

01 Jan 2000

8 citations

Proceedings Article•

Machine learning for modeling Dutch pronunciation variation

[...]

Veronique Hoste, Steven Gillis, Walter Daelemans

01 Jan 2000

TL;DR: This paper compares two rule induction techniques for the automatic extraction of phonemic knowledge and rules from pairs of pronunciation lexicons, and concludes that, whereas classication-based rule induction with C5.0 is more accurate, the transformation rules learned with TBEDL can be more easily interpreted.

...read moreread less

Abstract: This paper describes the use of rule induction techniques for the automatic extraction of phonemic knowledge and rules from pairs of pronunciation lexicons. This extracted knowledge allows the adaptation of speech processing systems to regional variants of a language. As a case study, we apply the approach to Northern Dutch and Flemish (the variant of Dutch spoken in Flanders, a part of Belgium) , based on Celex and Fonilex, pronunciation lexicons for Northern Dutch and Flemish, respectively. In our study, we compare two rule induction techniques, TransformationBased Error-Driven Learning (TBEDL) (Brill, 1995) and C5.0 (Quinlan, 1993), and evaluate the extracted knowledge quantitatively (accuracy) and qualitatively (linguistic relevance of the rules). We conclude that, whereas classication-based rule induction with C5.0 is more accurate, the transformation rules learned with TBEDL can be more easily interpreted.

...read moreread less

8 citations

A distributed, yet symbolic model of text-to-speech processing

[...]

A. van den Bosch, Walter Daelemans

01 Jan 2000

Proceedings Article•DOI•

Genetic algorithms for feature relevance assignment in memory-based language processing

[...]

Anne Kool¹, Walter Daelemans¹, Jakub Zavrel¹•Institutions (1)

University of Antwerp¹

13 Sep 2000

TL;DR: This work uses a simple genetic algorithm (GA) for this problem on two typical tasks in natural language processing: morphological synthesis and unknown word tagging and finds that GA feature selection always significantly outperforms the MBLP variant without selection and that feature ordering and weighting with GA significantly outperform a situation where no weighting is used.

...read moreread less

Abstract: We investigate the usefulness of evolutionary algorithms in three incarnations of the problem of feature relevance assignment in memory-based language processing (MBLP): feature weighting, feature ordering and feature selection. We use a simple genetic algorithm (GA) for this problem on two typical tasks in natural language processing: morphological synthesis and unknown word tagging. We find that GA feature selection always significantly outperforms the MBLP variant without selection and that feature ordering and weighting with GA significantly outperforms a situation where no weighting is used. However, GA selection does not significantly do better than simple iterative feature selection methods, and GA weighting and ordering reach only similar performance as current information-theoretic feature weighting methods.

...read moreread less

Diverse classifiers for NLP disambiguation tasks: comparisons, optimization, combination, and evolution

[...]

Jakub Zavrel, S. Degroeve, Anne Kool, Walter Daelemans, K. Jokinen - Show less +1 more

01 Jan 2000

TL;DR: Preliminary results from an ongoing study that investigates the performance of machine learning classifiers on a diverse set of Natural Language Processing (NLP) tasks are reported.

...read moreread less

Abstract: In this paper we report preliminary results from an ongoing study that investigates the performance of machine learning classifiers on a diverse set of Natural Language Processing (NLP) tasks. First, we compare a number of popular existing learning methods (Neural networks, Memory-based learning, Rule induction, Decision trees, Maximum Entropy, Winnow Perceptrons, Naive Bayes and Support Vector Machines), and discuss their properties vis à vis typical NLP data sets. Next, we turn to methods to optimize the parameters of single learning methods through cross-validation and evolutionary algorithms. Then we investigate how we can get the best of all single methods through combination of the tested systems in classifier ensembles. Finally we discuss new and more thorough methods of automatically constructing ensembles of classifiers based on the techniques used for parameter optimization.

...read moreread less

Proceedings Article•DOI•

The role of algorithm bias vs information source in learning algorithms for Morphosyntactic Disambiguation

[...]

Guy De Pauw¹, Walter Daelemans¹•Institutions (1)

University of Antwerp¹

13 Sep 2000

TL;DR: This paper systematically compare two inductive learning approaches to tagging: MX-POST ( based on maximum entropy modeling) and MBT (based on memory-based learning) and results indicate that earlier observed differences in accuracy can be attributed largely to differences in information sources used, rather than to algorithm bias.

...read moreread less

Abstract: Morphosyntactic Disambiguation (Part of Speech tagging) is a useful benchmark problem for system comparison because it is typical for a large class of Natural Language Processing (NLP) problems that can be defined as disambiguation in local context. This paper adds to the literature on the systematic and objective evaluation of different methods to automatically learn this type of disambiguation problem. We systematically compare two inductive learning approaches to tagging: MX-POST (based on maximum entropy modeling) and MBT (based on memory-based learning). We investigate the effect of different sources of information on accuracy when comparing the two approaches under the same conditions. Results indicate that earlier observed differences in accuracy can be attributed largely to differences in information sources used, rather than to algorithm bias.

...read moreread less

Comparing bagging and boosting for natural language processing tasks, a typicality approach

[...]

Veronique Hoste, Walter Daelemans

01 Jan 2000

Proceedings Article•

A short introduction to GRAEL grammar adaptation, evolution and learning

[...]

G. De Pauw, Walter Daelemans

01 Jan 2000

Book Chapter•

Zelflerende systemen als instrument voor de taalkunde en de taaltechnologie

[...]

Walter Daelemans, G. De Pauw, G. Durieux, Steven Gillis, VÃ©ronique Hoste, E.F. Tjong Kim Sang, Jan Nuyts, J. Taeldeman - Show less +4 more

01 Jan 2000

TL;DR: In this article, the authors introduceelflerende systemen als een operationalisering van pre-Chomskyaanse taaltheoretische concepten, and laten zien hoe ze kunnen worden toegepast in taalbeschrijving and (computer)taalkunde.

...read moreread less

Abstract: We introduceren zelflerende systemen als een operationalisering van pre-Chomskyaanse taaltheoretische concepten als analogie en inductie, en laten zien hoe ze kunnen worden toegepast in taalbeschrijving en (computer)taalkunde. Als casus bespreken we twee toepassingen: de automatische inductie van kennis over de beregeling van allomorfie bij Nederlandse diminutieven, en de rol van segmentele fonologische kennis bij de leerbaarheid van Nederlandse klemtoon.

...read moreread less

Posted Content•

Meta-Learning for Phonemic Annotation of Corpora

[...]

Veronique Hoste, Walter Daelemans, Erik Tjong Kim Sang, Steven Gillis

18 Aug 2000-arXiv: Computation and Language

TL;DR: This paper applied rule induction, classifier combination and meta-learning (stacked classifiers) to the problem of bootstrapping high accuracy automatic annotation of corpora with pronunciation information.

...read moreread less

Abstract: We apply rule induction, classifier combination and meta-learning (stacked classifiers) to the problem of bootstrapping high accuracy automatic annotation of corpora with pronunciation information. The task we address in this paper consists of generating phonemic representations reflecting the Flemish and Dutch pronunciations of a word on the basis of its orthographic representation (which in turn is based on the actual speech recordings). We compare several possible approaches to achieve the text-to-pronunciation mapping task: memory-based learning, transformation-based learning, rule induction, maximum entropy modeling, combination of classifiers in stacked learning, and stacking of meta-learners. We are interested both in optimal accuracy and in obtaining insight into the linguistic regularities involved. As far as accuracy is concerned, an already high accuracy level (93% for Celex and 86% for Fonilex at word level) for single classifiers is boosted significantly with additional error reductions of 31% and 38% respectively using combination of classifiers, and a further 5% using combination of meta-learners, bringing overall word level accuracy to 96% for the Dutch variant and 92% for the Flemish variant. We also show that the application of machine learning methods indeed leads to increased insight into the linguistic regularities determining the variation between the two pronunciation variants studied.

...read moreread less

Posted Content•

Applying System Combination to Base Noun Phrase Identification

[...]

Erik Tjong Kim Sang¹, Walter Daelemans¹, Hervé Déjean², Rob Koeling, Yuval Krymolowski³, Vasin Punyakanok⁴, Dan Roth⁴ - Show less +3 more•Institutions (4)

University of Antwerp¹, University of Tübingen², Bar-Ilan University³, University of Illinois at Urbana–Champaign⁴

17 Aug 2000-arXiv: Computation and Language

TL;DR: This article used seven machine learning algorithms for one task: identifying base noun phrases, and the results have been processed by different system combination methods and all of these outperformed the best individual result.

...read moreread less