Showing papers by "Georgios Paltoglou published in 2008"

PDF

Open Access

Proceedings Article•DOI•

Integral based source selection for uncooperative distributed information retrieval environments

[...]

Georgios Paltoglou¹, Michail Salampasis², Maria Satratzemi¹•Institutions (2)

University of Macedonia¹, Alexander Technological Educational Institute of Thessaloniki²

30 Oct 2008

TL;DR: The algorithm functions by modeling each information source as an integral, using the relevance score and the intra-collection position of its sampled documents in reference to a centralized sample index and selects the collections that cover the largest area in the rank-relevance space.

...read moreread less

Abstract: In this paper, a new source selection algorithm for uncooperative distributed information retrieval environments is presented. The algorithm functions by modeling each information source as an integral, using the relevance score and the intra-collection position of its sampled documents in reference to a centralized sample index and selects the collections that cover the largest area in the rank-relevance space. Based on the above novel metric, the algorithm explicitly focuses on addressing the two goals of source selection; high recall which is important for source recommendation applications and high precision aiming to produce a high precision final merged list. For the latter goal in particular, the new approach steps away from the usual practice of DIR systems of explicitly declaring the number of collections that must be queried and instead receives as input only the number of retrieved documents in the final merged list, dynamically calculating the number of collections that are selected and the number of documents requested from each. The algorithm is tested in a wide range of testbeds in both recall and precision oriented settings and its effectiveness is found to be equal or better than other state-of-the-art algorithms.

...read moreread less

15 citations

Journal Article•DOI•

A results merging algorithm for distributed information retrieval environments that combines regression methodologies with a selective download phase

[...]

Georgios Paltoglou¹, Michail Salampasis², Maria Satratzemi¹•Institutions (2)

University of Macedonia¹, Alexander Technological Educational Institute of Thessaloniki²

01 Jul 2008-Information Processing and Management

TL;DR: A new algorithm is introduced that is based on adaptively downloading a limited, selected number of documents from the remote collections and estimating the relevance of the rest through regression methodologies, achieving the limited time and bandwidth overhead of the estimation approaches and the increased effectiveness of the download.

...read moreread less

Abstract: The problem of results merging in distributed information retrieval environments has gained significant attention the last years. Two generic approaches have been introduced in research. The first approach aims at estimating the relevance of the documents returned from the remote collections through ad hoc methodologies (such as weighted score merging, regression etc.) while the other is based on downloading all the documents locally, completely or partially, in order to calculate their relevance. Both approaches have advantages and disadvantages. Download methodologies are more effective but they pose a significant overhead on the process in terms of time and bandwidth. Approaches that rely solely on estimation on the other hand, usually depend on document relevance scores being reported by the remote collections in order to achieve maximum performance. In addition to that, regression algorithms, which have proved to be more effective than weighted scores merging algorithms, need a significant number of overlap documents in order to function effectively, practically requiring multiple interactions with the remote collections. The new algorithm that is introduced is based on adaptively downloading a limited, selected number of documents from the remote collections and estimating the relevance of the rest through regression methodologies. Thus it reconciles the above two approaches, combining their strengths, while minimizing their drawbacks, achieving the limited time and bandwidth overhead of the estimation approaches and the increased effectiveness of the download. The proposed algorithm is tested in a variety of settings and its performance is found to be significantly better than the former, while approximating that of the latter.

...read moreread less

11 citations

Proceedings Article•DOI•

A Comparison of Centralized and Distributed Information Retrieval Approaches

[...]

Georgios Paltoglou, Michail Salampasis¹, Maya Satratzemi²•Institutions (2)

Alexander Technological Educational Institute of Thessaloniki¹, University of Macedonia²

28 Aug 2008

TL;DR: It is investigated if and under which conditions can DIR offer a new paradigm for both efficient and effective information retrieval.

...read moreread less

Abstract: Distributed Information Retrieval (DIR) has been suggested to offer a prospective solution to a number of issues concerning information retrieval in the WWW. On the other hand, previous studies have indicated that centralized approaches offer the best solution for optimal quality of result (i.e. effectiveness). In this paper, we revisit those claims and investigate if and under which conditions can DIR offer a new paradigm for both efficient and effective information retrieval.

...read moreread less

3 citations

Proceedings Article•DOI•

Indexing and retrieval of a Greek corpus

[...]

Georgios Paltoglou¹, Michail Salampasis², Fotis Lazarinis³•Institutions (3)

University of Macedonia¹, Alexander Technological Educational Institute of Thessaloniki², American Hotel & Lodging Educational Institute³

30 Oct 2008

TL;DR: A character mapping scheme is employed, in order to overcome the problem of the diversity of diacritics used in the language, such as accents and diaeresis, and word distance and fuzzy similarity metrics are utilized to make up for the different forms that nouns, verbs and articles appear because of conjugations and inflections.

...read moreread less

Abstract: Greek is one of the most difficult languages to handle in Web Information Retrieval (IR) related tasks. Its difficulty stems from the fact that it is grammatically, morphologically and orthographically more complex than the lingua franca of IR, English. In this paper, we address a significant number of issues that originate from the Greek language. We use a number of techniques to determine the correct encoding that is used by web pages written in Greek. We test the effect of using a Greek stopword list in a realistic and controlled Web environment. We employ a character mapping scheme, in order to overcome the problem of the diversity of diacritics used in the language, such as accents and diaeresis. We utilize word distance and fuzzy similarity metrics in order to make up for the different forms that nouns, verbs and articles appear because of conjugations and inflections and additionally handle greeklish queries, a transliterated form of Greek. The conducted experiments present some effective ways to increase the accuracy in Greek IR tasks.

...read moreread less

3 citations