Showing papers by "Walter Daelemans published in 2009"

PDF

Open Access

Proceedings Article•DOI•

A Metalearning Approach to Processing the Scope of Negation

[...]

Roser Morante¹, Walter Daelemans¹•Institutions (1)

04 Jun 2009

TL;DR: A machine learning system that finds the scope of negation in biomedical texts by combines several classifiers and works in two phases and achieves the best results to date for this task.

...read moreread less

Abstract: Finding negation signals and their scope in text is an important subtask in information extraction. In this paper we present a machine learning system that finds the scope of negation in biomedical texts. The system combines several classifiers and works in two phases. To investigate the robustness of the approach, the system is tested on the three subcorpora of the BioScope corpus representing different text types. It achieves the best results to date for this task, with an error reduction of 32.07% compared to current state of the art results.

...read moreread less

145 citations

Proceedings Article•DOI•

Learning the Scope of Hedge Cues in Biomedical Texts

[...]

Roser Morante¹, Walter Daelemans¹•Institutions (1)

University of Antwerp¹

04 Jun 2009

TL;DR: It is shown that the same scope finding approach can be applied to both negation and hedging, and the system is tested on the three subcorpora of the BioScope corpus that represent different text types.

...read moreread less

Abstract: Identifying hedged information in biomedical literature is an important subtask in information extraction because it would be misleading to extract speculative information as factual information. In this paper we present a machine learning system that finds the scope of hedge cues in biomedical texts. The system is based on a similar system that finds the scope of negation cues. We show that the same scope finding approach can be applied to both negation and hedging. To investigate the robustness of the approach, the system is tested on the three subcorpora of the BioScope corpus that represent different text types.

...read moreread less

131 citations

Proceedings Article•DOI•

A Robust and Extensible Exemplar-Based Model of Thematic Fit

[...]

Bram Vandekerckhove, Dominiek Sandra, Walter Daelemans¹•Institutions (1)

University of Antwerp¹

30 Mar 2009

TL;DR: It is argued that the model does not approximate thematic fit as argument plausibility or 'fit with verb selectional preferences', but directly as semantic role plausibility for a verb- argument pair, through similarity-based generalization from previously seen verb-argument pairs.

...read moreread less

Abstract: This paper presents a new, exemplar-based model of thematic fit. In contrast to previous models, it does not approximate thematic fit as argument plausibility or 'fit with verb selectional preferences', but directly as semantic role plausibility for a verb-argument pair, through similarity-based generalization from previously seen verb-argument pairs. This makes the model very robust for data sparsity. We argue that the model is easily extensible to a model of semantic role ambiguity resolution during online sentence comprehension. The model is evaluated on human semantic role plausibility judgments. Its predictions correlate significantly with the human judgments. It rivals two state-of-the-art models of thematic fit and exceeds their performance on previously unseen or low-frequency items.

...read moreread less

15 citations

Proceedings Article•DOI•

Reducing Redundancy in Multi-document Summarization Using Lexical Semantic Similarity

[...]

Iris Hendrickx¹, Walter Daelemans¹, Erwin Marsi², Emiel Krahmer²•Institutions (2)

University of Antwerp¹, Tilburg University²

06 Aug 2009

TL;DR: This work presents an automatic multi-document summarization system for Dutch based on the MEAD system, and introduces a semantic overlap detection tool, which goes beyond simple string matching.

...read moreread less

Abstract: We present an automatic multi-document summarization system for Dutch based on the MEAD system. We focus on redundancy detection, an essential ingredient of multi-document summarization. We introduce a semantic overlap detection tool, which goes beyond simple string matching. Our results so far do not confirm our expectation that this tool would outperform the other tested methods.

...read moreread less

13 citations

Proceedings Article•

Prepositional Phrase Attachment in Shallow Parsing

[...]

Vincent Van Asch¹, Walter Daelemans¹•Institutions (1)

University of Antwerp¹

01 Sep 2009

TL;DR: A method to evaluate the PP attachment task in a more natural situation is provided, making it possible to compare the approach to full statistical parsing approaches, and the domain adaptation properties of both approaches are investigated.

...read moreread less

Abstract: In this paper we extend a shallow parser [6] with prepositional phrase attachment. Although the PP attachment task is a well-studied task in a discriminative learning context, it is mostly addressed in the context of artificial situations like the quadruple classification task [18] in which only two possible attachment sites, each time a noun or a verb, are possible. In this paper we provide a method to evaluate the task in a more natural situation, making it possible to compare the approach to full statistical parsing approaches. First, we show how to extract anchor-pp pairs from parse trees in the GENIA and WSJ treebanks. Next, we discuss the extension of the shallow parser with a PP-attacher. We compare the PP attachment module with a statistical full parsing approach [4] and analyze the results. More specifically, we investigate the domain adaptation properties of both approaches (in this case domain shifts between journalistic and medical language).

...read moreread less

11 citations

Proceedings Article•

Prototype-based Active Learning for Lemmatization

[...]

Walter Daelemans¹, H.J. Groenewald², Gerhard B. van Huyssteen³•Institutions (3)

University of Antwerp¹, North-West University², Council for Scientific and Industrial Research³

01 Sep 2009

TL;DR: This is the author's version of the work and the definitive version was published in the Proceedings of the RANLP'2009 International Conference Recent Advances in Natural Language Processing.

...read moreread less

Abstract: This is the author's version of the work. The definitive version was published in the Proceedings of the RANLP'2009 International Conference Recent Advances in Natural Language Processing. Borovets, Bulgaria, 14-16 September, 2009. pp 65-70

...read moreread less

10 citations

Proceedings Article•DOI•

A memory-based learning approach to event extraction in biomedical texts

[...]

Roser Morante¹, Vincent Van Asch¹, Walter Daelemans¹•Institutions (1)

University of Antwerp¹

05 Jun 2009

TL;DR: The memory-based machine learning system that was submitted to the BioNLP Shared Task on Event Extraction modeled the event extraction task using an approach that has been previously applied to other natural language processing tasks like semantic role labeling or negation scope finding.

...read moreread less

Abstract: In this paper we describe the memory-based machine learning system that we submitted to the BioNLP Shared Task on Event Extraction. We modeled the event extraction task using an approach that has been previously applied to other natural language processing tasks like semantic role labeling or negation scope finding. The results obtained by our system (30.58 F-score in Task 1 and 29.27 in Task 2) suggest that the approach and the system need further adaptation to the complexity involved in extracting biomedical events.

...read moreread less

9 citations

Proceedings Article•DOI•

Is Sentence Compression an NLG task

[...]

Erwin Marsi¹, Emiel Krahmer¹, Iris Hendrickx, Walter Daelemans•Institutions (1)

Tilburg University¹

30 Mar 2009

TL;DR: This work shows that only 16% of the observed compressed sentences in the domain of subtitling can be accounted for in this way, and argues for more elaborate sentence compression models which build on NLG work.

...read moreread less

Abstract: Data-driven approaches to sentence compression define the task as dropping any subset of words from the input sentence while retaining important information and grammaticality. We show that only 16% of the observed compressed sentences in the domain of subtitling can be accounted for in this way. We argue that part of this is due to evaluation issues and estimate that a deletion model is in fact compatible with approximately 55% of the observed data. We analyse the remaining problems and conclude that in those cases word order changes and paraphrasing are crucial, and argue for more elaborate sentence compression models which build on NLG work.

...read moreread less

3 citations