scispace - formally typeset
Search or ask a question

Showing papers by "Walter Daelemans published in 2009"


Proceedings ArticleDOI
04 Jun 2009
TL;DR: A machine learning system that finds the scope of negation in biomedical texts by combines several classifiers and works in two phases and achieves the best results to date for this task.
Abstract: Finding negation signals and their scope in text is an important subtask in information extraction. In this paper we present a machine learning system that finds the scope of negation in biomedical texts. The system combines several classifiers and works in two phases. To investigate the robustness of the approach, the system is tested on the three subcorpora of the BioScope corpus representing different text types. It achieves the best results to date for this task, with an error reduction of 32.07% compared to current state of the art results.

145 citations


Proceedings ArticleDOI
04 Jun 2009
TL;DR: It is shown that the same scope finding approach can be applied to both negation and hedging, and the system is tested on the three subcorpora of the BioScope corpus that represent different text types.
Abstract: Identifying hedged information in biomedical literature is an important subtask in information extraction because it would be misleading to extract speculative information as factual information. In this paper we present a machine learning system that finds the scope of hedge cues in biomedical texts. The system is based on a similar system that finds the scope of negation cues. We show that the same scope finding approach can be applied to both negation and hedging. To investigate the robustness of the approach, the system is tested on the three subcorpora of the BioScope corpus that represent different text types.

131 citations


Proceedings ArticleDOI
30 Mar 2009
TL;DR: It is argued that the model does not approximate thematic fit as argument plausibility or 'fit with verb selectional preferences', but directly as semantic role plausibility for a verb- argument pair, through similarity-based generalization from previously seen verb-argument pairs.
Abstract: This paper presents a new, exemplar-based model of thematic fit. In contrast to previous models, it does not approximate thematic fit as argument plausibility or 'fit with verb selectional preferences', but directly as semantic role plausibility for a verb-argument pair, through similarity-based generalization from previously seen verb-argument pairs. This makes the model very robust for data sparsity. We argue that the model is easily extensible to a model of semantic role ambiguity resolution during online sentence comprehension. The model is evaluated on human semantic role plausibility judgments. Its predictions correlate significantly with the human judgments. It rivals two state-of-the-art models of thematic fit and exceeds their performance on previously unseen or low-frequency items.

15 citations


Proceedings ArticleDOI
06 Aug 2009
TL;DR: This work presents an automatic multi-document summarization system for Dutch based on the MEAD system, and introduces a semantic overlap detection tool, which goes beyond simple string matching.
Abstract: We present an automatic multi-document summarization system for Dutch based on the MEAD system. We focus on redundancy detection, an essential ingredient of multi-document summarization. We introduce a semantic overlap detection tool, which goes beyond simple string matching. Our results so far do not confirm our expectation that this tool would outperform the other tested methods.

13 citations


Proceedings Article
01 Sep 2009
TL;DR: A method to evaluate the PP attachment task in a more natural situation is provided, making it possible to compare the approach to full statistical parsing approaches, and the domain adaptation properties of both approaches are investigated.
Abstract: In this paper we extend a shallow parser [6] with prepositional phrase attachment. Although the PP attachment task is a well-studied task in a discriminative learning context, it is mostly addressed in the context of artificial situations like the quadruple classification task [18] in which only two possible attachment sites, each time a noun or a verb, are possible. In this paper we provide a method to evaluate the task in a more natural situation, making it possible to compare the approach to full statistical parsing approaches. First, we show how to extract anchor-pp pairs from parse trees in the GENIA and WSJ treebanks. Next, we discuss the extension of the shallow parser with a PP-attacher. We compare the PP attachment module with a statistical full parsing approach [4] and analyze the results. More specifically, we investigate the domain adaptation properties of both approaches (in this case domain shifts between journalistic and medical language).

11 citations


Proceedings Article
01 Sep 2009
TL;DR: This is the author's version of the work and the definitive version was published in the Proceedings of the RANLP'2009 International Conference Recent Advances in Natural Language Processing.
Abstract: This is the author's version of the work. The definitive version was published in the Proceedings of the RANLP'2009 International Conference Recent Advances in Natural Language Processing. Borovets, Bulgaria, 14-16 September, 2009. pp 65-70

10 citations


Proceedings ArticleDOI
05 Jun 2009
TL;DR: The memory-based machine learning system that was submitted to the BioNLP Shared Task on Event Extraction modeled the event extraction task using an approach that has been previously applied to other natural language processing tasks like semantic role labeling or negation scope finding.
Abstract: In this paper we describe the memory-based machine learning system that we submitted to the BioNLP Shared Task on Event Extraction. We modeled the event extraction task using an approach that has been previously applied to other natural language processing tasks like semantic role labeling or negation scope finding. The results obtained by our system (30.58 F-score in Task 1 and 29.27 in Task 2) suggest that the approach and the system need further adaptation to the complexity involved in extracting biomedical events.

9 citations


Proceedings ArticleDOI
30 Mar 2009
TL;DR: This work shows that only 16% of the observed compressed sentences in the domain of subtitling can be accounted for in this way, and argues for more elaborate sentence compression models which build on NLG work.
Abstract: Data-driven approaches to sentence compression define the task as dropping any subset of words from the input sentence while retaining important information and grammaticality. We show that only 16% of the observed compressed sentences in the domain of subtitling can be accounted for in this way. We argue that part of this is due to evaluation issues and estimate that a deletion model is in fact compatible with approximately 55% of the observed data. We analyse the remaining problems and conclude that in those cases word order changes and paraphrasing are crucial, and argue for more elaborate sentence compression models which build on NLG work.

3 citations