Showing papers presented at "Extending Database Technology in 2004"

PDF

Open Access

Book Chapter•DOI•

Query recommendation using query logs in search engines

[...]

Ricardo Baeza-Yates¹, Carlos Hurtado¹, Marcelo Mendoza²•Institutions (2)

University of Chile¹, Valparaiso University²

14 Mar 2004

TL;DR: A method is proposed that, given a query submitted to a search engine, suggests a list of related queries that are based in previously issued queries and can be issued by the user to the search engine to tune or redirect the search process.

...read moreread less

Abstract: In this paper we propose a method that, given a query submitted to a search engine, suggests a list of related queries The related queries are based in previously issued queries, and can be issued by the user to the search engine to tune or redirect the search process The method proposed is based on a query clustering process in which groups of semantically similar queries are identified The clustering process uses the content of historical preferences of users registered in the query log of the search engine The method not only discovers the related queries, but also ranks them according to a relevance criterion Finally, we show with experiments over the query log of a search engine the effectiveness of the method.

...read moreread less

656 citations

Book Chapter•DOI•

A Condensation Approach to Privacy Preserving Data Mining

[...]

Charu C. Aggarwal¹, Philip Shi-Lung Yu¹•Institutions (1)

IBM¹

14 Mar 2004

TL;DR: A new and flexible approach for privacy preserving data mining which does not require new problem-specific algorithms, since it maps the original data set into a new anonymized data set, including the correlations among the different dimensions is developed.

...read moreread less

Abstract: In recent years, privacy preserving data mining has become an important problem because of the large amount of personal data which is tracked by many business applications. In many cases, users are unwilling to provide personal information unless the privacy of sensitive information is guaranteed. In this paper, we propose a new framework for privacy preserving data mining of multi-dimensional data. Previous work for privacy preserving data mining uses a perturbation approach which reconstructs data distributions in order to perform the mining. Such an approach treats each dimension independently and therefore ignores the correlations between the different dimensions. In addition, it requires the development of a new distribution based algorithm for each data mining problem, since it does not use the multi-dimensional records, but uses aggregate distributions of the data as input. This leads to a fundamental re-design of data mining algorithms. In this paper, we will develop a new and flexible approach for privacy preserving data mining which does not require new problem-specific algorithms, since it maps the original data set into a new anonymized data set. This anonymized data closely matches the characteristics of the original data including the correlations among the different dimensions. We present empirical results illustrating the effectiveness of the method.

...read moreread less

366 citations

Book Chapter•DOI•

Efficient Distributed Skylining for Web Information Systems

[...]

Wolf-Tilo Balke¹, Ulrich Güntzer², Jason Xin Zheng¹•Institutions (2)

University of California¹, University of Tübingen²

14 Mar 2004

TL;DR: This work shows how to efficiently perform distributed skyline queries and thus essentially extend the expressiveness of querying today’s Web information systems and presents useful heuristics to further speed up the retrieval in most practical cases.

...read moreread less

Abstract: Though skyline queries already have claimed their place in retrieval over central databases, their application in Web information systems up to now was impossible due to the distributed aspect of retrieval over Web sources. But due to the amount, variety and volatile nature of information accessible over the Internet extended query capabilities are crucial. We show how to efficiently perform distributed skyline queries and thus essentially extend the expressiveness of querying today’s Web information systems. Together with our innovative retrieval algorithm we also present useful heuristics to further speed up the retrieval in most practical cases paving the road towards meeting even the real-time challenges of on-line information services. We discuss performance evaluations and point to open problems in the concept and application of skylining in modern information systems. For the curse of dimensionality, an intrinsic problem in skyline queries, we propose a novel sampling scheme that allows to get an early impression of the skyline for subsequent query refinement.

...read moreread less

331 citations

Book Chapter•DOI•

Hippo: A System for Computing Consistent Answers to a Class of SQL Queries

[...]

Jan Chomicki¹, Jerzy Marcinkowski, Sławek Staworko¹•Institutions (1)

University at Buffalo¹

14 Mar 2004

TL;DR: In this paper, the authors propose a method to preserve data consistency in the case of integration of several data sources, even if the sources are separately consistent, the integrated data can violate the integrity constraints.

...read moreread less

Abstract: Integrity constraints express important properties of data, but the task of preserving data consistency is becoming increasingly problematic with new database applications. For example, in the case of integration of several data sources, even if the sources are separately consistent, the integrated data can violate the integrity constraints. The traditional approach, removing the conflicting data, is not a good option because the sources can be autonomous. Another scenario is a long-running activity where consistency can be violated only temporarily and future updates will restore it. Finally, data consistency may be neglected because of efficiency or other reasons.

...read moreread less

299 citations

Book Chapter•DOI•

Spatiotemporal Compression Techniques for Moving Point Objects

[...]

Nirvana Meratnia, R.A. de By

14 Mar 2004

TL;DR: In this paper, the authors present a technique for moving object data handling with time-stamped positions in the context of mobile positioning data, where positioning technology is rapidly making its way into the consumer market, not only through the already ubiquitous cell phone but also through small, on-board positioning devices in many means of transport and in other types of portable equipment.

...read moreread less

Abstract: Moving object data handling has received a fair share of attention over recent years in the spatial database community. This is understandable as positioning technology is rapidly making its way into the consumer market, not only through the already ubiquitous cell phone but soon also through small, on-board positioning devices in many means of transport and in other types of portable equipment. It is thus to be expected that all these devices will start to generate an unprecedented data stream of time-stamped positions. Sooner or later, such enormous volumes of data will lead to storage, transmission, computation, and display challenges. Hence, the need for compression techniques.

...read moreread less

272 citations

Book Chapter•DOI•

LIMBO: Scalable Clustering of Categorical Data

[...]

Periklis Andritsos¹, Panayiotis Tsaparas¹, Renée J. Miller¹, Kenneth C. Sevcik¹•Institutions (1)

University of Toronto¹

14 Mar 2004

TL;DR: This work introduces LIMBO, a scalable hierarchical categorical clustering algorithm that builds on the Information Bottleneck (IB) framework for quantifying the relevant information preserved when clustering, and shows how the LIMBO algorithm can be used to cluster both tuples and values.

...read moreread less

Abstract: Clustering is a problem of great practical importance in numerous applications. The problem of clustering becomes more challenging when the data is categorical, that is, when there is no inherent distance measure between data values. We introduce LIMBO, a scalable hierarchical categorical clustering algorithm that builds on the Information Bottleneck (IB) framework for quantifying the relevant information preserved when clustering. As a hierarchical algorithm, LIMBO has the advantage that it can produce clusterings of different sizes in a single execution. We use the IB framework to define a distance measure for categorical tuples and we also present a novel distance measure for categorical attribute values. We show how the LIMBO algorithm can be used to cluster both tuples and values. LIMBO handles large data sets by producing a memory bounded summary model for the data. We present an experimental evaluation of LIMBO, and we study how clustering quality compares to other categorical clustering algorithms. LIMBO supports a trade-off between efficiency (in terms of space and time) and quality. We quantify this trade-off and demonstrate that LIMBO allows for substantial improvements in efficiency with negligible decrease in quality.

...read moreread less

264 citations

Book Chapter•DOI•

Iterative incremental clustering of time series

[...]

Jessica Lin¹, Michail Vlachos¹, Eamonn Keogh¹, Dimitrios Gunopulos¹•Institutions (1)

University of California, Riverside¹

14 Mar 2004

TL;DR: In this article, an anytime version of partitional clustering algorithm, such as k-means and EM, for time series is presented, which works by leveraging off the multi-resolution property of wavelets.

...read moreread less

Abstract: We present a novel anytime version of partitional clustering algorithm, such as k-Means and EM, for time series. The algorithm works by leveraging off the multi-resolution property of wavelets. The dilemma of choosing the initial centers is mitigated by initializing the centers at each approximation level, using the final centers returned by the coarser representations. In addition to casting the clustering algorithms as anytime algorithms, this approach has two other very desirable properties. By working at lower dimensionalities we can efficiently avoid local minima. Therefore, the quality of the clustering is usually better than the batch algorithm. In addition, even if the algorithm is run to completion, our approach is much faster than its batch counterpart. We explain, and empirically demonstrate these surprising and desirable properties with comprehensive experiments on several publicly available real data sets. We further demonstrate that our approach can be generalized to a framework of much broader range of algorithms or data mining problems.

...read moreread less

210 citations

Book Chapter•DOI•

MobiEyes: Distributed Processing of Continuously Moving Queries on Moving Objects in a Mobile System

[...]

Bugra Gedik¹, Ling Liu¹•Institutions (1)

Georgia Institute of Technology¹

14 Mar 2004

TL;DR: In this paper, the authors present a distributed and scalable solution to processing continuously moving queries on moving objects and describe the design of MobiEyes, a distributed real-time location monitoring system in a mobile environment.

...read moreread less

Abstract: Location monitoring is an important issue for real time management of mobile object positions. Significant research efforts have been dedicated to techniques for efficient processing of spatial continuous queries on moving objects in a centralized location monitoring system. Surprisingly, very few have promoted a distributed approach to real-time location monitoring. In this paper we present a distributed and scalable solution to processing continuously moving queries on moving objects and describe the design of MobiEyes, a distributed real-time location monitoring system in a mobile environment. Mobieyes utilizes the computational power at mobile objects, leading to significant savings in terms of server load and messaging cost when compared to solutions relying on central processing of location information at the server. We introduce a set of optimization techniques, such as Lazy Query Propagation, Query Grouping, and Safe Periods, to constrict the amount of computations handled by the moving objects and to enhance the performance and system utilization of Mobieyes. We also provide a simulation model in a mobile setup to study the scalability of the MobiEyes distributed location monitoring approach with regard to server load, messaging cost, and amount of computation required on the mobile objects.

...read moreread less

173 citations

Book Chapter•DOI•

Hierarchical In-Network Data Aggregation with Quality Guarantees

[...]

Antonios Deligiannakis¹, Yannis Kotidis², Nick Roussopoulos¹•Institutions (2)

University of Maryland, College Park¹, AT&T Labs²

14 Mar 2004

TL;DR: A new algorithm is introduced, based on potential gains, which adaptively redistributes the error thresholds to those nodes that benefit the most and tries to minimize the total number of transmitted messages in the network.

...read moreread less

Abstract: Earlier work has demonstrated the effectiveness of in-network data aggregation in order to minimize the amount of messages exchanged during continuous queries in large sensor networks. The key idea is to build an aggregation tree, in which parent nodes aggregate the values received from their children. Nevertheless, for large sensor networks with severe energy constraints the reduction obtained through the aggregation tree might not be sufficient. In this paper we extend prior work on in-network data aggregation to support approximate evaluation of queries to further reduce the number of exchanged messages among the nodes and extend the longevity of the network. A key ingredient to our framework is the notion of the residual mode of operation that is used to eliminate messages from sibling nodes when their cumulative change is small. We introduce a new algorithm, based on potential gains, which adaptively redistributes the error thresholds to those nodes that benefit the most and tries to minimize the total number of transmitted messages in the network. Our experiments demonstrate that our techniques significantly outperform previous approaches and reduce the network traffic by exploiting the super-imposed tree hierarchy.

...read moreread less

162 citations

Book Chapter•DOI•

DBDC: Density Based Distributed Clustering

[...]

Eshref Januzaj¹, Hans-Peter Kriegel¹, Martin Pfeifle¹•Institutions (1)

Ludwig Maximilian University of Munich¹

14 Mar 2004

TL;DR: The complex problem of finding a suitable quality measure for evaluating distributed clusterings is discussed and two quality criteria which are compared to each other and which allow us to evaluate the quality of the DBDC algorithm are introduced.

...read moreread less

Abstract: Clustering has become an increasingly important task in modern application domains such as marketing and purchasing assistance, multimedia, molecular biology as well as many others. In most of these areas, the data are originally collected at different sites. In order to extract information from these data, they are merged at a central site and then clustered. In this paper, we propose a different approach. We cluster the data locally and extract suitable representatives from these clusters. These representatives are sent to a global server site where we restore the complete clustering based on the local representatives. This approach is very efficient, because the local clustering can be carried out quickly and independently from each other. Furthermore, we have low transmission cost, as the number of transmitted representatives is much smaller than the cardinality of the complete data set. Based on this small number of representatives, the global clustering can be done very efficiently. For both the local and the global clustering, we use a density based clustering algorithm. The combination of both the local and the global clustering forms our new DBDC (Density Based Distributed Clustering) algorithm. Furthermore, we discuss the complex problem of finding a suitable quality measure for evaluating distributed clusterings. We introduce two quality criteria which are compared to each other and which allow us to evaluate the quality of our DBDC algorithm. In our experimental evaluation, we will show that we do not have to sacrifice clustering quality in order to gain an efficiency advantage when using our distributed clustering approach.

...read moreread less

162 citations

Book Chapter•DOI•

Content-Based Routing of Path Queries in Peer-to-Peer Systems

[...]

Georgia Koloniari¹, Evaggelia Pitoura¹•Institutions (1)

University of Ioannina¹

14 Mar 2004

TL;DR: This paper proposes a fully decentralized approach to the problem of routing path queries among the nodes of a P2P system based on maintaining specialized data structures, called filters that efficiently summarize the content of one or more node, and advocates building a hierarchical organization of nodes by clustering together nodes with similar content.

...read moreread less

Abstract: Peer-to-peer (P2P) systems are gaining increasing popularity as a scalable means to share data among a large number of autonomous nodes. In this paper, we consider the case in which the nodes in a P2P system store XML documents. We propose a fully decentralized approach to the problem of routing path queries among the nodes of a P2P system based on maintaining specialized data structures, called filters that efficiently summarize the content, i.e., the documents, of one or more node. Our proposed filters, called multi-level Bloom filters, are based on extending Bloom filters so that they maintain information about the structure of the documents. In addition, we advocate building a hierarchical organization of nodes by clustering together nodes with similar content. Similarity between nodes is related to the similarity between the corresponding filters. We also present an efficient method for update propagation. Our experimental results show that multi-level Bloom filters outperform the classical Bloom filters in routing path queries. Furthermore, the content-based hierarchical grouping of nodes increases recall, that is, the number of documents that are retrieved.

...read moreread less

Book Chapter•DOI•

HOPI: An Efficient Connection Index for Complex XML Document Collections

[...]

Ralf Schenkel¹, Anja Theobald¹, Gerhard Weikum¹•Institutions (1)

Max Planck Society¹

14 Mar 2004

TL;DR: HOPI as discussed by the authors is a new connection index for XML documents based on the concept of the 2-hop cover of a directed graph introduced by Cohen et al. In contrast to most of the prior work on XML indexing, HOPI considers not only paths with child or parent relationships between the nodes, but also provides space-and time-efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in our XXL search engine.

...read moreread less

Abstract: In this paper we present HOPI, a new connection index for XML documents based on the concept of the 2–hop cover of a directed graph introduced by Cohen et al. In contrast to most of the prior work on XML indexing we consider not only paths with child or parent relationships between the nodes, but also provide space– and time–efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in our XXL search engine. We improve the theoretical concept of a 2–hop cover by developing scalable methods for index creation on very large XML data collections with long paths and extensive cross–linkage. Our experiments show substantial savings in the query performance of the HOPI index over previously proposed index structures in combination with low space requirements.

...read moreread less

Book Chapter•DOI•

SPROUT: P2P routing with social networks

[...]

Sergio Marti¹, Prasanna Ganesan¹, Hector Garcia-Molina¹•Institutions (1)

Stanford University¹

14 Mar 2004

TL;DR: A trust model that lets us compare routing algorithms for P2P networks overlaying social networks, and proposes SPROUT, a DHT routing algorithm that, by using social links, significantly increases the number of query results and reduces query delays.

...read moreread less

Abstract: In this paper, we investigate how existing social networks can benefit P2P data networks by leveraging the inherent trust associated with social links We present a trust model that lets us compare routing algorithms for P2P networks overlaying social networks.We propose SPROUT, a DHT routing algorithm that, by using social links, significantly increases the number of query results and reduces query delays.We discuss further optimization and design choices for both the model and the routing algorithm Finally, we evaluate our model versus regular DHT routing and Gnutella-like flooding.

...read moreread less

Book Chapter•DOI•

XPath with Conditional Axis Relations

[...]

Maarten Marx¹•Institutions (1)

University of Amsterdam¹

14 Mar 2004

TL;DR: An XPath dialect \(\mathcal{X}\)CPath is defined which is expressively complete, has a linear time query evaluation algorithm and for which query equivalence given a DTD can be decided in exponential time.

...read moreread less

Abstract: This paper is about the W3C standard node-addressing language for XML documents, called XPath. XPath is still under development. Version 2.0 appeared in 2001 while the theoretical foundations of Version 1.0 (dating from 1998) are still being widely studied. The paper aims at bringing XPath to a “stable fixed point” in its development: a version which is expressively complete, still manageable computationally, with a user-friendly syntax and a natural semantics. We focus on an important axis relation which is not expressible in XPath 1.0 and is very useful in practice: the conditional axis. With it we can express paths specified by for instance “do a child step, while test is true at the resulting node”. We study the effect of adding conditional axis relations to XPath on its expressive power and the complexity of the query evaluation and query equivalence problems. We define an XPath dialect \(\mathcal{X}\)CPath which is expressively complete, has a linear time query evaluation algorithm and for which query equivalence given a DTD can be decided in exponential time.

...read moreread less

Book Chapter•DOI•

P2PR-Tree: an R-tree-based spatial index for peer-to-peer environments

[...]

Anirban Mondal¹, Yi Lifu¹, Masaru Kitsuregawa¹•Institutions (1)

University of Tokyo¹

14 Mar 2004

TL;DR: The results of the performance evaluation demonstrate that it is indeed practically feasible to share spatial data in a P2P system and that P2PR-tree is able to outperform MC-Rtree significantly.

...read moreread less

Abstract: The unprecedented growth and increased importance of geographically distributed spatial data has created a strong need for efficient sharing of such data Interestingly, the ever-increasing popularity of peer-to-peer (P2P) systems has opened exciting possibilities for such sharing This motivates our investigation into spatial indexing in P2P systems While much work has been done towards expediting search in file-sharing P2P systems, issues concerning spatial indexing in P2P systems are significantly more complicated due to overlaps between spatial objects and the complexity of spatial queries Incidentally, existing R-tree-based structures for distributed environments (e.g., the MC-Rtree) are not adequate for addressing the sheer scale, dynamism and heterogeneity of P2P environments Hence, we propose the P2PR-tree (Peer-to-Peer R-tree), which is a new spatial index specifically designed for P2P systems The main features of P2PR-tree are two-fold First, it is hierarchical and performs efficient pruning of the search space by maintaining minimal amount of information concerning peers that are far away and storing more information concerning nearby peers, thereby optimizing disk space usage Second, it is completely decentralized, scalable and robust to peers joining/leaving the system The results of our performance evaluation demonstrate that it is indeed practically feasible to share spatial data in a P2P system and that P2PR-tree is able to outperform MC-Rtree significantly.

...read moreread less

Book Chapter•DOI•

XQzip: Querying Compressed XML Using Structural Indexing

[...]

James Cheng¹, Wilfred Ng¹•Institutions (1)

Hong Kong University of Science and Technology¹

14 Mar 2004

TL;DR: This paper focuses on the problem of running queries on compressed XML data and some compressors proposed to address this problem have been proposed but are usually worse than that of XMill and that of the generic compressor gzip.

...read moreread less

Abstract: XML makes data flexible in representation and easily portable on the Web but it also substantially inflates data size as a consequence of using tags to describe data. Although many effective XML compressors, such as XMill, have been recently proposed to solve this data inflation problem, they do not address the problem of running queries on compressed XML data. More recently, some compressors have been proposed to query compressed XML data. However, the compression ratio of these compressors is usually worse than that of XMill and that of the generic compressor gzip, while their query performance and the expressive power of the query language they support are inadequate.

...read moreread less

Book Chapter•DOI•

Clustering XML documents using structural summaries

[...]

Theodore Dalamagas¹, Tao Cheng², Klaas-Jan Winkel³, Timos Sellis¹•Institutions (3)

National Technical University of Athens¹, University of California, Santa Barbara², University of Twente³

14 Mar 2004

TL;DR: The usage of tree structural summaries is suggested to improve the performance of the distance calculation and at the same time to maintain or even improve its quality.

...read moreread less

Abstract: This work presents a methodology for grouping structurally similar XML documents using clustering algorithms Modeling XML documents with tree-like structures, we face the ‘clustering XML documents by structure' problem as a ‘tree clustering' problem, exploiting distances that estimate the similarity between those trees in terms of the hierarchical relationships of their nodes We suggest the usage of tree structural summaries to improve the performance of the distance calculation and at the same time to maintain or even improve its quality Experimental results are provided using a prototype testbed.

...read moreread less

Book Chapter•DOI•

[...]

Karin Kailing¹, Hans-Peter Kriegel¹, Stefan Schönauer¹, Thomas Seidl²•Institutions (2)

Ludwig Maximilian University of Munich¹, RWTH Aachen University²

14 Mar 2004

TL;DR: In this article, a set of new filter methods for structural and for content-based information in tree-structured data as well as ways to flexibly combine different filter criteria are presented.

...read moreread less

Abstract: Structured and semi-structured object representations are getting more and more important for modern database applications. Examples for such data are hierarchical structures including chemical compounds, XML data or image data. As a key feature, database systems have to support the search for similar objects where it is important to take into account both the structure and the content features of the objects. A successful approach is to use the edit distance for tree structured data. As the computation of this measure is NP-complete, constrained edit distances have been successfully applied to trees. While yielding good results, they are still computationally complex and, therefore, of limited benefit for searching in large databases. In this paper, we propose a filter and refinement architecture to overcome this problem. We present a set of new filter methods for structural and for content-based information in tree-structured data as well as ways to flexibly combine different filter criteria. The efficiency of our methods, resulting from the good selectivity of the filters is demonstrated in extensive experiments with real-world applications.

...read moreread less

Book Chapter•DOI•

Context- and situation-awareness in information logistics

[...]

Ulrich Meissen¹, Stefan Pfennigschmidt¹, Agnès Voisard¹, Tjark Wahnfried¹•Institutions (1)

Fraunhofer Society¹

14 Mar 2004

TL;DR: A model to handle various contexts and situations in information logistics by introducing semantical aspects defined in an ontology is presented and the situation awareness proposal has been tested.

...read moreread less

Abstract: In order to deliver relevant information at the right time to its mobile users, systems such as event notification systems need to be aware of the users' context, which includes the current time, their location, or the devices they use Many context frameworks have been introduced in the past few years However, they usually do not consider the notion of characteristic features of contexts that are invariant during certain time intervals Knowing the current situation of a user allows the system to better target the information to be delivered This paper presents a model to handle various contexts and situations in information logistics A context is defined as a collection of values usually observed by sensors, eg., location or temperature A situation builds on this concept by introducing semantical aspects defined in an ontology Our situation awareness proposal has been tested in two projects.

...read moreread less

Book Chapter•DOI•

XPeer: a self-organizing XML P2P database system

[...]

Carlo Sartiani¹, Paolo Manghi¹, Giorgio Ghelli¹, Giovanni Conforti¹•Institutions (1)

University of Pisa¹

14 Mar 2004

TL;DR: XPeer is a zero-administration system for sharing and querying XML data that can be used in any application field, being a general purpose XML p2p DBMS, even though its main application is the management of resource descriptions in GRID environments.

...read moreread less

Abstract: This paper describes XPeer , a zero-administration system for sharing and querying XML data The system allows users to share XML data without significant human intervention, and to pose XQuery FLWR queries against them The proposed system can be used in any application field, being a general purpose XML p2p DBMS, even though its main application is the management of resource descriptions in GRID environments.

...read moreread less

Book Chapter•DOI•

An overview of web data clustering practices

[...]

Athena Vakali¹, Jaroslav Pokorný², Theodore Dalamagas³•Institutions (3)

Aristotle University of Thessaloniki¹, Charles University in Prague², National Technical University of Athens³

14 Mar 2004

TL;DR: An overview of the most popular methodologies and implementations in terms of clustering either Web users or Web sources are presented and a survey about current status and future trends in clustering employed over the Web is presented.

...read moreread less

Abstract: Clustering is a challenging topic in the area of Web data management Various forms of clustering are required in a wide range of applications, including finding mirrored Web pages, detecting copyright violations, and reporting search results in a structured way Clustering can either be performed once offline, (independently to search queries), or online (on the results of search queries) Important efforts have focused on mining Web access logs and to cluster search engine results on the fly Online methods based on link structure and text have been applied successfully to finding pages on related topics This paper presents an overview of the most popular methodologies and implementations in terms of clustering either Web users or Web sources and presents a survey about current status and future trends in clustering employed over the Web.

...read moreread less

Book Chapter•DOI•

A Framework for Efficient Storage Security in RDBMS

[...]

Balakrishna R. Iyer¹, Sharad Mehrotra², Einar Mykletun², Gene Tsudik², Yonghua Wu² - Show less +1 more•Institutions (2)

IBM¹, University of California, Irvine²

14 Mar 2004

TL;DR: In this article, the authors proposed a new secure storage model and a key management architecture which enable efficient cryptographic operations while maintaining a very high level of security, and evaluated the performance of the proposed model by experimenting with a prototype implementation based on the TPC-H data set.

...read moreread less

Abstract: With the widespread use of e-business coupled with the public’s awareness of data privacy issues and recent database security related legislations, incorporating security features into modern database products has become an increasingly important topic. Several database vendors already offer integrated solutions that provide data privacy within existing products. However, treating security and privacy issues as an afterthought often results in inefficient implementations. Some notable RDBMS storage models (such as the N-ary Storage Model) suffer from this problem. In this work, we analyze issues in storage security and discuss a number of trade-offs between security and efficiency. We then propose a new secure storage model and a key management architecture which enable efficient cryptographic operations while maintaining a very high level of security. We also assess the performance of our proposed model by experimenting with a prototype implementation based on the well-known TPC-H data set.

...read moreread less

Book•DOI•

Content-based Video Retrieval

[...]

Milan Petkovic

01 Jan 2004

TL;DR: The need for tools that can manipulate the video content in the same way as traditional databases manage numeric and textual data is significant.

...read moreread less

Abstract: Recent advances in multimedia technologies allow the capture and storage of video data with relatively inexpensive computers. Furthermore, the new possibilities offered by the information highways have made a large amount of video data publicly available. However, without appropriate search techniques all these data are hardly usable. Users are not satisfied with the video retrieval systems that provide analogue VCR functionality. They want to query the content instead of the raw video data. For example, a user analysing a soccer video will ask for specific events such as goals. Content-based search and retrieval of video data becomes a challenging and important problem. Therefore, the need for tools that can manipulate the video content in the same way as traditional databases manage numeric and textual data is significant.

...read moreread less

Book Chapter•DOI•

Semantic web recommender systems

[...]

Cai-Nicolas Ziegler¹•Institutions (1)

University of Freiburg¹

14 Mar 2004

TL;DR: The primary objective targets the successful deployment and integration of recommender system facilities for Semantic Web applications, making use of novel technologies and concepts and incorporating them into one coherent framework.

...read moreread less

Abstract: Research on recommender systems has primarily addressed centralized scenarios and largely ignored open, decentralized systems where remote information distribution prevails The absence of superordinate authorities having full access and control introduces some serious issues requiring novel approaches and methods Hence, our primary objective targets the successful deployment and integration of recommender system facilities for Semantic Web applications, making use of novel technologies and concepts and incorporating them into one coherent framework.

...read moreread less

Book Chapter•DOI•

OGSA-DQP: A Service for Distributed Querying on the Grid

[...]

M. Nedim Alpdemir¹, Arijit Mukherjee², Anastasios Gounaris¹, Norman W. Paton¹, Paul Watson², Alvaro A. A. Fernandes¹, Desmond J. Fitzgerald¹ - Show less +3 more•Institutions (2)

University of Manchester¹, Newcastle University²

14 Mar 2004

TL;DR: This demon- stration aims to illustrate the capabilities of OGSA-DQP prototype via a GUI Client over a collection of bioinformatics databases and analysis tools.

...read moreread less

Abstract: OGSA-DQP is a distributed query processor exposed to users as an Open Grid Services Architecture (OGSA)-compliant Grid service. This service supports the compilation and evaluation of queries that combine data obtained from multiple services on the Grid, including Grid Database Services (GDSs) and computational web services. Not only does OGSA-DQP support integrated access to multiple Grid services, it is itself implemented as a collection of interacting Grid services. OGSA-DQP illustrates how Grid service orchestrations can be used to perform complex, data-intensive parallel computations. The OGSA-DQP prototype is downloadable from www.ogsadai.org.uk/dqp/. This demonstration aims to illustrate the capabilities of OGSA-DQP prototype via a GUI Client over a collection of bioinformatics databases and analysis tools.

...read moreread less

Patent•DOI•

Sketch-based multi-query processing over data streams

[...]

Alin Dobra¹, Johannes Gehrke¹, Rajeev Rastogi¹, Minos Garofalakis¹•Institutions (1)

Alcatel-Lucent¹

29 Dec 2004

TL;DR: In this article, the authors proposed a method of efficiently providing estimated answers to workloads of aggregate, multi-join SQL-like queries over a number of input data-streams.

...read moreread less

Abstract: A method of efficiently providing estimated answers to workloads of aggregate, multi-join SQL-like queries over a number of input data-streams. The method only examines each data elements once and uses a limited amount of computer memory. The method uses join graphs and atomic sketches that are essentially pseudo-random summaries formed using random binary variables. The estimated answer is the product of all the atomic sketches for all the vertices in the query join graph. A query workload is processed efficiently by identifying and sharing atomic sketches common to distinct queries, while ensuring that the join graphs remain well formed. The method may automatically minimize either the average query error or the maximum query error over the workload.

...read moreread less

Book Chapter•DOI•

Joining Punctuated Streams

[...]

Luping Ding¹, Nishant Mehta¹, Elke A. Rundensteiner¹, George T. Heineman¹•Institutions (1)

Worcester Polytechnic Institute¹

14 Mar 2004

TL;DR: In this paper, the authors focus on stream join optimization by exploiting the constraints that are dynamically embedded into data streams to signal the end of transmitting certain attribute values These constraints are called punctuations.

...read moreread less

Abstract: We focus on stream join optimization by exploiting the constraints that are dynamically embedded into data streams to signal the end of transmitting certain attribute values These constraints are called punctuations Our stream join operator, PJoin, is able to remove no-longer-useful data from the state in a timely manner based on punctuations, thus reducing memory overhead and improving the efficiency of probing We equip PJoin with several alternate strategies for purging the state and for propagating punctuations to benefit down-stream operators We also present an extensive experimental study to explore the performance gains achieved by purging state as well as the trade-off between different purge strategies Our experimental results of comparing the performance of PJoin with XJoin, a stream join operator without a constraint-exploiting mechanism, show that PJoin significantly outperforms XJoin with regard to both memory overhead and throughput

...read moreread less

Book Chapter•DOI•

Spatial Queries in the Presence of Obstacles

[...]

Jun Zhang¹, Dimitris Papadias¹, Kyriakos Mouratidis¹, Manli Zhu¹•Institutions (1)

Hong Kong University of Science and Technology¹

14 Mar 2004

TL;DR: This paper proposes efficient algorithms for the most important query types, namely, range search, nearest neighbors, e-distance joins and closest pairs, considering that both data objects and obstacles are indexed by R-trees.

...read moreread less

Abstract: Despite the existence of obstacles in many database applications, traditional spatial query processing utilizes the Euclidean distance metric assuming that points in space are directly reachable. In this paper, we study spatial queries in the presence of obstacles, where the obstructed distance between two points is defined as the length of the shortest path that connects them without crossing any obstacles. We propose efficient algorithms for the most important query types, namely, range search, nearest neighbors, e-distance joins and closest pairs, considering that both data objects and obstacles are indexed by R-trees. The effectiveness of the proposed solutions is verified through extensive experiments.

...read moreread less

Patent•DOI•

Processing data-stream join aggregates using skimmed sketches

[...]

Sumit Ganguly¹, Minos Garofalakis¹, Rajeev Rastogi¹•Institutions (1)

Alcatel-Lucent¹

29 Dec 2004

TL;DR: In this paper, an atomic sketch is formed as the inner product of the data-stream frequency vector and a random binary variable, from which the frequency values that exceed a predetermined threshold have been skimmed off and placed in a dense frequency vector.

...read moreread less

Abstract: A method of estimating an aggregate of a join over data-streams in real-time using skimmed sketches, that only examines each data element once and has a worst case space requirement of O(n 2 /J), where J is the size of the join and n is the number of data elements. The skimmed sketch is an atomic sketch, formed as the inner product of the data-stream frequency vector and a random binary variable, from which the frequency values that exceed a predetermined threshold have been skimmed off and placed in a dense frequency vector. The join size is estimated as the sum of the sub-joins of skimmed sketches and dense frequency vectors. The atomic sketches may be arranged in a hash structure so that processing a data element only requires updating a single sketch per hash table. This keeps the per-element overhead logarithmic in the domain and stream sizes.

...read moreread less

Book Chapter•DOI•

Semantic query routing and processing in p2p database systems: the ICS-FORTH SQPeer middleware

[...]

George Kokkinidis, Vassilis Christophides¹•Institutions (1)

University of Crete¹

14 Mar 2004

TL;DR: A fully-fledged framework for evaluating semantic queries over peer RDF/S bases (materialized or virtual) is missing and the ICS-FORTH SQPeer middleware for routing and processing RQL queries and RVL views is presented.

...read moreread less

Abstract: Peer-to-peer (P2P) computing is currently attracting enormous attention In P2P systems a very large number of autonomous computing nodes (the peers) pool together their resources and rely on each other for data and services More and more P2P data management systems rely nowadays on intensional (i.e schema) information for integrating and querying peer bases Such information can be easily captured by emerging Semantic Web languages such as RDF/S However, a fully-fledged framework for evaluating semantic queries over peer RDF/S bases (materialized or virtual) is missing In this paper we present the ICS-FORTH SQPeer middleware for routing and processing RQL queries and RVL views The novelty of SQPeer lies on the use of intensional active schemas for determining relevant peer bases, as well as, constructing distributed query plans In this context, we consider optimization opportunities for SQPeer query plans.

...read moreread less