Search or ask a question

Showing papers by "Marie-Francine Moens published in 2004"

PDF

Open Access

Clustering Algorithms for Noun Phrase Coreference Resolution

[...]

Roxana Angheluta¹, Patrick Jeuniaux¹, Rudradeb Mitra, Marie-Francine Moens¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jan 2004

TL;DR: Two novel algorithms for noun phrase coreference resolution are developed, a fuzzy algorithm and its hard variant and their performance on two dierent sets of texts in comparison with an existing fuzzy and a hard clustering algorithm.

...read moreread less

Abstract: In this paper, we present four clustering algorithms for noun phrase coreference resolution. We developed two novel algorithms for this task, a fuzzy algorithm and its hard variant and evaluated their performance on two dierent sets of texts in comparison with an existing fuzzy and a hard clustering algorithm that are described in the literature. Our algorithms perform slightly better and do not rely on a predefined threshold distance value for cluster membership. In addition, our fuzzy clustering algorithm seems to perform better than a hard clustering on a pronoun resolution task.

...read moreread less

12 citations

K.U.Leuven summarization system at DUC 2004

[...]

Marie-Francine Moens, Roxana Angheluta, Rudradeb Mitra, Xiuli Jing

01 Jan 2004

7 citations

Proceedings Article•

XML Retrieval Models for Legislation

[...]

Marie-Francine Moens

01 Jan 2004

TL;DR: This article reports on different XML retrieval models explicitly designed for the retrieval of legislation and which are based on the vector space model and the probabilistic language model.

...read moreread less

Abstract: Legislation contains text-rich documents and is increasingly marked with XML tags. The XML markup can - among other uses - be exploited to more precisely answer free information queries. In this article we report on different XML retrieval models we explicitly designed for the retrieval of legislation and which are based on the vector space model and the probabilistic language model. In addition search data structures are designed for legislative databases that support these retrieval models. We show that the models provide more advanced access to the content of statutes. Legislation typically involves structured information including the division of a statute in for instance titles, chapters, sections and articles, and the typical metadata (e.g., indication of the date of enactment, the area of applicability and references to other statutes) that are as- signed to the statute or its parts. Additionally, legislation contains large parts of unstructured information found in the natural language texts. The structured information is increasingly tagged with markup languages such as XML (Extensible Markup Language ). The use of such a markup language makes it possible that documents can be easily interchanged between institutions and systems, and that the markups are interpretable across the use of different software. From way back, legal information retrieval is an important information technology ap- plication (2), and it has an increasing significance. Legislative texts are currently accessible through specifically designed portal sites owned by governments or private institutions. The search engines that operate on the legal documents usually offer a full-text search (i.e., every word of the text including some metadata is indexed and can be searched). A full-text search is popular because it provides a flexible information access: The user can build any search query. When information is retrieved by using a full text search, the resulting answers of a search are ranked according to relevance to the query. The current search engines that operate on legislation allow for an extra selection of the content through filling out specific fields that represent specific structured content of the document (e.g., statute title, number of an article, etc.). There is a recent trend in information retrieval to take into account the structured infor- mation of documents (e.g., as marked by XML) and especially the hierarchical logical doc- ument structure when generating the answer to a query and when computing the relevance ranking. This has several advantages. The use of the document structure allows generating a more precise answer to an information query. Instead of returning the complete document as the answer, a structural element or several elements are given. Such an approach meets the current need of users of legal information systems, who demand more precise answers to information queries (8). Moreover, research has only recently started to exploit the relation- ships between structured elements in ranking functions. For instance, depending on where a

...read moreread less

6 citations

Proceedings Article•

Summarizing texts at various levels of detail

[...]

Marie-Francine Moens¹, Roxana Angheluta¹, Rik De Busser¹, Patrick Jeuniaux¹•Institutions (1)

Katholieke Universiteit Leuven¹

26 Apr 2004

TL;DR: This article discusses a technique of generating hierarchical topic trees of a text and to use them in various ways to build summaries of a flexible length and compares the results when the topic tree is used for automatic summarization.

...read moreread less

Abstract: Summarizing document texts at various levels of detail is required for many information selection tasks For instance, when loading and visualizing documents on small screens of handheld devices, it is important to be able to dynamically compress texts In this article we discuss a technique of generating hierarchical topic trees of a text and to use them in various ways to build summaries of a flexible length For the topic tree building process we have implemented both a deterministic and probabilistic approach We compare the results when the topic tree is used for automatic summarization

...read moreread less

2 citations