Showing papers by "Michalis Vazirgiannis published in 2007"

PDF

Open Access

Proceedings Article•DOI•

SKYPEER: Efficient Subspace Skyline Computation over Distributed Data

[...]

Akrivi Vlachou¹, Christos Doulkeridis¹, Yannis Kotidis¹, Michalis Vazirgiannis¹•Institutions (1)

Athens University of Economics and Business¹

15 Apr 2007

TL;DR: This paper addresses the efficient computation of subspace skyline queries in large-scale peer-to-peer (P2P) networks, where the dataset is horizontally distributed across the peers, and proposes a threshold based algorithm, called SKYPEER, which forwards the skyline query requests among peers in such a way that the amount of transferred data is significantly reduced.

...read moreread less

Abstract: Skyline query processing has received considerable attention in the recent past. Mainly, the skyline query is used to find a set of non dominated data points in a multidimensional dataset. While most previous work has assumed a centralized setting, in this paper we address the efficient computation of subspace skyline queries in large-scale peer-to-peer (P2P) networks, where the dataset is horizontally distributed across the peers. Relying on a super-peer architecture we propose a threshold based algorithm, called SKYPEER, which forwards the skyline query requests among peers, in such a way that the amount of transferred data is significantly reduced. For efficient subspace skyline processing, we extend the notion of domination by defining the extended skyline set, which contains all data elements that are necessary to answer a skyline query in any arbitrary subspace. We prove that our algorithm provides the exact answers and we present optimization techniques to reduce communication cost and execution time. Finally, we provide an extensive experimental evaluation showing that SKYPEER performs efficiently and provides a viable solution when a large degree of distribution is required.

...read moreread less

136 citations

Proceedings Article•

Word sense disambiguation with spreading activation networks generated from thesauri

[...]

George Tsatsaronis¹, Michalis Vazirgiannis¹, Ion Androutsopoulos¹•Institutions (1)

Athens University of Economics and Business¹

06 Jan 2007

TL;DR: A new unsupervised WSD algorithm is proposed, which is based on generating Spreading Activation Networks (SANs) from the senses of a thesaurus and the relations between them, and a new method of assigning weights to the networks' links.

...read moreread less

Abstract: Most word sense disambiguation (WSD) methods require large quantities of manually annotated training data and/or do not exploit fully the semantic relations of thesauri. We propose a new unsupervised WSD algorithm, which is based on generating Spreading Activation Networks (SANs) from the senses of a thesaurus and the relations between them. A new method of assigning weights to the networks' links is also proposed. Experiments show that the algorithm outperforms previous unsupervised approaches to WSD.

...read moreread less

88 citations

Journal Article•DOI•

DESENT: decentralized and distributed semantic overlay generation in P2P networks

[...]

Christos Doulkeridis, Kjetil Nørvåg¹, Michalis Vazirgiannis²•Institutions (2)

Norwegian University of Science and Technology¹, French Institute for Research in Computer Science and Automation²

01 Jan 2007-IEEE Journal on Selected Areas in Communications

TL;DR: This paper describes an unsupervised approach for decentralized and distributed generation of SONS (DESENT), and through simulations and analytical cost models the claims regarding performance, scalability, and quality are verified.

...read moreread less

Abstract: The current approach in web searching, i.e., using centralized search engines, rises issues that question their future applicability: 1) coverage and scalability, 2) freshness, and 3) information monopoly. Performing web search using a P2P architecture that consists of the actual web servers has the potential to tackle those issues. In order to achieve the desired performance and scalability, as well as enhancing search quality relative to centralized search engines, semantic overlay networks (SONS) connecting peers storing semantically related information can be employed. The lack of global content/topology knowledge in a P2P system is the key challenge in forming SONS, and this paper describes an unsupervised approach for decentralized and distributed generation of SONS (DESENT). Through simulations and analytical cost models we verify our claims regarding performance, scalability, and quality.

...read moreread less

61 citations

Proceedings Article•

[...]

Christos Doulkeridis¹, Akrivi Vlachou¹, Yannis Kotidis¹, Michalis Vazirgiannis¹•Institutions (1)

Athens University of Economics and Business¹

23 Sep 2007

TL;DR: This paper presents SIMPEER, a novel framework that dynamically clusters peer data, in order to build distributed routing information at super-peer level that reduces communication cost, network latency, bandwidth consumption and computational overhead at each individual peer.

...read moreread less

Abstract: This paper addresses the efficient processing of similarity queries in metric spaces, where data is horizontally distributed across a P2P network. The proposed approach does not rely on arbitrary data movement, hence each peer joining the network autonomously stores its own data. We present SIMPEER, a novel framework that dynamically clusters peer data, in order to build distributed routing information at super-peer level. SIMPEER allows the evaluation of range and nearest neighbor queries in a distributed manner that reduces communication cost, network latency, bandwidth consumption and computational overhead at each individual peer. SIMPEER utilizes a set of distributed statistics and guarantees that all similar objects to the query are retrieved, without necessarily flooding the network during query processing. The statistics are employed for estimating an adequate query radius for k-nearest neighbor queries, and transform the query to a range query. Our experimental evaluation employs both real-world and synthetic data collections, and our results show that SIMPEER performs efficiently, even in the case of high degree of distribution.

...read moreread less

60 citations

Journal Article•DOI•

Web site personalization based on link analysis and navigational patterns

[...]

Magdalini Eirinaki¹, Michalis Vazirgiannis¹•Institutions (1)

Athens University of Economics and Business¹

01 Oct 2007-ACM Transactions on Internet Technology

TL;DR: UPR, a PageRank-style algorithm which combines usage data and link analysis techniques for assigning probabilities to Web pages based on their importance in the Web site's navigational graph, is presented and it is proved that this approach results in more objective and representative predictions than the ones produced from the pure usage-based approaches.

...read moreread less

Abstract: The continuous growth in the size and use of the World Wide Web imposes new methods of design and development of online information services. The need for predicting the users' needs in order to improve the usability and user retention of a Web site is more than evident and can be addressed by personalizing it. Recommendation algorithms aim at proposing “next” pages to users based on their current visit and past users' navigational patterns. In the vast majority of related algorithms, however, only the usage data is used to produce recommendations, disregarding the structural properties of the Web graph. Thus important—in terms of PageRank authority score—pages may be underrated. In this work, we present UPR, a PageRank-style algorithm which combines usage data and link analysis techniques for assigning probabilities to Web pages based on their importance in the Web site's navigational graph. We propose the application of a localized version of UPR (l-UPR) to personalized navigational subgraphs for online Web page ranking and recommendation. Moreover, we propose a hybrid probabilistic predictive model based on Markov models and link analysis for assigning prior probabilities in a hybrid probabilistic model. We prove, through experimentation, that this approach results in more objective and representative predictions than the ones produced from the pure usage-based approaches.

...read moreread less

25 citations

Proceedings Article•DOI•

Comparing apples and oranges: normalized pagerank for evolving graphs

[...]

Klaus Berberich¹, Srikanta Bedathur¹, Gerhard Weikum¹, Michalis Vazirgiannis²•Institutions (2)

Max Planck Society¹, French Institute for Research in Computer Science and Automation²

08 May 2007

TL;DR: This work presents an efficiently computable normalization for PageRank scores that makes them comparable across graphs, and shows that the normalized PageRank Scores are robust to non-local changes in the graph, unlike the standard PageRank measure.

...read moreread less

Abstract: PageRank is the best known technique for link-based importance ranking. The computed importance scores, however, are not directly comparable across different snapshots of an evolving graph. We present an efficiently computable normalization for PageRank scores that makes them comparable across graphs. Furthermore, we show that the normalized PageRank scores are robust to non-local changes in the graph, unlike the standard PageRank measure.

...read moreread less

23 citations

Journal Article•DOI•

Context-based caching and routing for P2P web service discovery

[...]

Christos Doulkeridis, Vassilis Zafeiris, Kjetil Nørvåg¹, Michalis Vazirgiannis, E. A. Giakoumakis - Show less +1 more•Institutions (1)

Norwegian University of Science and Technology¹

01 Feb 2007-Distributed and Parallel Databases

TL;DR: In this paper, context-aware web service discovery is proposed to enable the provision of the most appropriate services at the right location and time in a mobile peer-to-peer environment.

...read moreread less

Abstract: In modern heterogeneous environments, such as mobile, pervasive and ad-hoc networks, architectures based on web services offer an attractive solution for effective communication and inter-operation. In such dynamic and rapidly evolving environments, efficient web service discovery is an important task. Usually this task is based on the input/output parameters or other functional attributes, however this does not guarantee the validity or successful utilization of retrieved web services. Instead, non-functional attributes, such as device power features, computational resources and connectivity status, that characterize the context of both service providers and consumers play an important role to the quality and usability of discovery results. In this paper we introduce context-awareness in web service discovery, enabling the provision of the most appropriate services at the right location and time. We focus on context-based caching and routing for improving web service discovery in a mobile peer-to-peer environment. We conducted a thorough experimental study, using our prototype implementation based on the JXTA framework, while simulations are employed for testing the scalability of the approach. We illustrate the advantages that this approach offers, both by evaluating the context-based cache performance and by comparing the efficiency of location-based routing to broadcast-based approaches.

...read moreread less

19 citations

Proceedings Article•DOI•

A Large Scale Data Mining Approach to Antibiotic Resistance Surveillance

[...]

Eugenia G. Giannopoulou¹, Vasileios P. Kemerlis², Michalis Polemis, J. Papaparaskevas³, Alkiviadis Vatopoulos, Michalis Vazirgiannis² - Show less +2 more•Institutions (3)

University of Peloponnese¹, Athens University of Economics and Business², National and Kapodistrian University of Athens³

20 Jun 2007

TL;DR: The contribution of the proposed framework is considered to be a standardized workflow aiming at the integration of data produced by various hospitals into a consistent data warehouse and the use of a mechanism that detects hidden and previously unknown patterns on large datasets in terms of association rules, which can provide surveillance warnings.

...read moreread less

Abstract: One of the most considerable functions in a hospital's infection control program is the surveillance of antibiotic resistance. Several traditional methods used to measure it do not provide adequate and promising results for further analysis. Data mining techniques, such as the association rules, have been used in the past and successfully led to discovering interesting patterns in public health data. In this work, we present the architecture of a novel framework which integrates data from multiple hospitals, discovers association rules, stores them in a data warehouse for future analysis and provides anytime accessibility through an intuitive Web interface. We implemented the proposed architecture as a Web application and evaluated it using data from the WHONET software installed in many Greek hospitals that belong to "the Greek System for Surveillance of Antimicrobial Resistance" network. The contribution of the proposed framework is considered to be a standardized workflow aiming at the integration of data produced by various hospitals into a consistent data warehouse and the use of a mechanism that detects hidden and previously unknown patterns on large datasets, in terms of association rules, which can provide surveillance warnings.

...read moreread less

10 citations

Book Chapter•DOI•

Designing a Peer-to-Peer Architecture for Distributed Image Retrieval

[...]

Akrivi Vlachou, Christos Doulkeridis, Dimitrios Mavroeidis, Michalis Vazirgiannis

05 Jul 2007

TL;DR: The core scientific and technological objectives of PIRES, a scalable decentralized and distributed infrastructure for building a search engine for image content capitalizing on P2P technology, are presented.

...read moreread less

Abstract: The World Wide Web provides an enormous amount of images easily accessible to everybody. The main challenge is to provide efficient search mechanisms for image content that are truly scalable and can support full coverage of web contents. In this paper, we present an architecture that adopts the peer-to-peer (P2P) paradigm for indexing, searching and ranking of image content. The ultimate goal of our architecture is to provide an adaptive search mechanism for image content, enhanced with learning, relying on image features, user-defined annotations and user feedback. Thus, we present PIRES, a scalable decentralized and distributed infrastructure for building a search engine for image content capitalizing on P2P technology. In the following, we first present the core scientific and technological objectives of PIRES, and then we present some preliminary experimental results of our prototype.

...read moreread less

9 citations

Book Chapter•DOI•

Stability Based Sparse LSI/PCA: Incorporating Feature Selection in LSI and PCA

[...]

Dimitrios Mavroeidis¹, Michalis Vazirgiannis¹•Institutions (1)

Athens University of Economics and Business¹

17 Sep 2007

TL;DR: This paper focuses on two well studied algorithms, LSI and PCA, and proposes a feature selection process that provably guarantees the stability of their outputs, and utilizes bootstrapping confidence intervals for assessing the statistical accuracy of the input sample matrices, and matrix perturbation theory in order to relate the statistics accuracy to the Stability of eigenvectors.

...read moreread less

Abstract: The stability of sample based algorithms is a concept commonly used for parameter tuning and validity assessment. In this paper we focus on two well studied algorithms, LSI and PCA, and propose a feature selection process that provably guarantees the stability of their outputs. The feature selection process is performed such that the level of (statistical) accuracy of the LSI/PCA input matrices is adequate for computing meaningful (stable) eigenvectors. The feature selection process "sparsifies" LSI/PCA, resulting in the projection of the instances on the eigenvectors of a principal submatrix of the original input matrix, thus producing sparse factor loadings that are linear combinations solely of the selected features. We utilize bootstrapping confidence intervals for assessing the statistical accuracy of the input sample matrices, and matrix perturbation theory in order to relate the statistical accuracy to the stability of eigenvectors. Experiments on several UCI-datasets verify empirically our approach.

...read moreread less

8 citations

Book Chapter•DOI•

Representing and Quantifying Rank - Change for the Web Graph

[...]

Akrivi Vlachou, Michalis Vazirgiannis¹, Klaus Berberich²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, Max Planck Society²

01 Nov 2007

TL;DR: This paper addresses the issue of representing and quantifying web ranking trends as a measure of web pages, and proposes normalized measures of ranking trends that are comparable among web graph snapshots of different sizes.

...read moreread less

Abstract: One of the grand research and industrial challenges in recent years is efficient web search, inherently involving the issue of page ranking. In this paper we address the issue of representing and quantifying web ranking trends as a measure of web pages. We study the rank position of a web page among different snapshots of the web graph and propose normalized measures of ranking trends that are comparable among web graph snapshots of different sizes. We define the ra nk c hang e r ate (racer)as a measure quantifying the web graph evolution. Thereafter, we examine different ways to aggregate the rank change rates and quantify the trends over a group of web pages. We outline the problem of identifying highly dynamic web pages and discuss possible future work. In our experimental evaluation we study the dynamics of web pages, especially those highly ranked.

...read moreread less