Showing papers on "Tuple published in 2003"

PDF

Open Access

Proceedings Article•DOI•

[...]

Jan Chomicki¹, P. Godfrey², Jarek Gryz³, Dongming Liang³•Institutions (3)

University at Buffalo¹, College of William & Mary², York University³

05 Mar 2003

TL;DR: A skyline algorithm, SFS, based on presorting that is general, for use with any skyline query, efficient, and well behaved in a relational setting is proposed.

...read moreread less

Abstract: The skyline, or Pareto, operator selects those tuples that are not dominated by any others. Extending relational systems with the skyline operator would offer a basis for handling preference queries. Good algorithms are needed for skyline, however, to make this efficient in a relational setting. We propose a skyline algorithm, SFS, based on presorting that is general, for use with any skyline query, efficient, and well behaved in a relational setting.

...read moreread less

788 citations

Book Chapter•DOI•

Load shedding in a data stream manager

[...]

Nesime Tatbul¹, Uğur Çetintemel¹, Stan Zdonik¹, Mitch Cherniack², Michael Stonebraker³ - Show less +1 more•Institutions (3)

Brown University¹, Brandeis University², Massachusetts Institute of Technology³

09 Sep 2003

TL;DR: This paper examines a technique for dynamically inserting and removing drop operators into query plans as required by the current load, and addresses the problems of determining when load shedding is needed, where in the query plan to insert drops, and how much of the load should be shed at that point in the plan.

...read moreread less

Abstract: A Data Stream Manager accepts push-based inputs from a set of data sources, processes these inputs with respect to a set of standing queries, and produces outputs based on Quality-of-Service (QoS) specifications. When input rates exceed system capacity, the system will become overloaded and latency will deteriorate. Under these conditions, the system will shed load, thus degrading the answer, in order to improve the observed latency of the results. This paper examines a technique for dynamically inserting and removing drop operators into query plans as required by the current load. We examine two types of drops: the first drops a fraction of the tuples in a randomized fashion, and the second drops tuples based on the importance of their content. We address the problems of determining when load shedding is needed, where in the query plan to insert drops, and how much of the load should be shed at that point in the plan. We describe efficient solutions and present experimental evidence that they can bring the system back into the useful operating range with minimal degradation in answer quality.

...read moreread less

662 citations

Book Chapter•DOI•

Efficient IR-style keyword search over relational databases

[...]

Vagelis Hristidis¹, Luis Gravano², Yannis Papakonstantinou¹•Institutions (2)

University of California, San Diego¹, Columbia University²

09 Sep 2003

TL;DR: This paper adapts IR-style document-relevance ranking strategies to the problem of processing free-form keyword queries over RDBMSs, and develops query-processing strategies that build on a crucial characteristic of IR- style keyword search: only the few most relevant matches are generally of interest.

...read moreread less

Abstract: Applications in which plain text coexists with structured data are pervasive. Commercial relational database management systems (RDBMSs) generally provide querying capabilities for text attributes that incorporate state-of-the-art information retrieval (IR) relevance ranking strategies, but this search functionality requires that queries specify the exact column or columns against which a given list of keywords is to be matched. This requirement can be cumbersome and inflexible from a user perspective: good answers to a keyword query might need to be "assembled" -in perhaps unforeseen ways- by joining tuples from multiple relations. This observation has motivated recent research on free-form keyword search over RDBMSs. In this paper, we adapt IR-style document-relevance ranking strategies to the problem of processing free-form keyword queries over RDBMSs. Our query model can handle queries with both AND and OR semantics, and exploits the sophisticated single-column text-search functionality often available in commercial RDBMSs. We develop query-processing strategies that build on a crucial characteristic of IR-style keyword search: only the few most relevant matches -according to some definition of "relevance"- are generally of interest. Consequently, rather than computing all matches for a keyword query, which leads to inefficient executions, our techniques focus on the top-k matches for the query, for moderate values of k. A thorough experimental evaluation over real data shows the performance advantages of our approach.

...read moreread less

581 citations

Proceedings Article•DOI•

Robust and efficient fuzzy match for online data cleaning

[...]

Surajit Chaudhuri¹, Kris Ganjam¹, Venkatesh Ganti¹, Rajeev Motwani²•Institutions (2)

Microsoft¹, Stanford University²

09 Jun 2003

TL;DR: A new similarity function is proposed which overcomes limitations of commonly used similarity functions, and an efficient fuzzy match algorithm is developed which can effectively clean an incoming tuple if it fails to match exactly with any tuple in the reference relation.

...read moreread less

Abstract: To ensure high data quality, data warehouses must validate and cleanse incoming data tuples from external sources. In many situations, clean tuples must match acceptable tuples in reference tables. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation.A significant challenge in such a scenario is to implement an efficient and accurate fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any tuple in the reference relation. In this paper, we propose a new similarity function which overcomes limitations of commonly used similarity functions, and develop an efficient fuzzy match algorithm. We demonstrate the effectiveness of our techniques by evaluating them on real datasets.

...read moreread less

548 citations

Proceedings Article•DOI•

Evaluating window joins over unbounded streams

[...]

Jaewoo Kang¹, Jeffrey F. Naughton¹, Stratis D. Viglas¹•Institutions (1)

University of Wisconsin-Madison¹

05 Mar 2003

TL;DR: A unit-time-basis cost model is introduced to analyze the expected performance of algorithms for evaluating sliding window joins over pairs of unbounded streams and shows that asymmetric combinations of join algorithms can outperform symmetric join algorithm implementations.

...read moreread less

Abstract: We investigate algorithms for evaluating sliding window joins over pairs of unbounded streams. We introduce a unit-time-basis cost model to analyze the expected performance of these algorithms. Using this cost model, we propose strategies for maximizing the efficiency of processing joins in three scenarios. First, we consider the case where one stream is much faster than the other. We show that asymmetric combinations of join algorithms, (e.g., hash join on one input, nested-loops join on the other) can outperform symmetric join algorithm implementations. Second, we investigate the case where system resources are insufficient to keep up with the input streams. We show that we can maximize the number of join result tuples produced in this case by properly allocating computing resources across the two input streams. Finally, we investigate strategies for maximizing the number of result tuples produced when memory is limited, and show that proper memory allocation across the two input streams can result in significantly lower resource usage and/or more result tuples produced.

...read moreread less

417 citations

Proceedings Article•DOI•

Approximate join processing over data streams

[...]

Abhinandan Das¹, Johannes Gehrke¹, Mirek Riedewald¹•Institutions (1)

Cornell University¹

09 Jun 2003

TL;DR: In this work, the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources is considered, and the number of generated result tuples as the quality measure is considered.

...read moreread less

Abstract: We consider the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources. In our model, we deal with resource constraints by shedding load in the form of dropping tuples from the data streams. We first discuss alternate architectural models for data stream join processing, and we survey suitable measures for the quality of an approximation of a set-valued query result. We then consider the number of generated result tuples as the quality measure, and we give optimal offline and fast online algorithms for it. In a thorough experimental study with synthetic and real data we show the efficacy of our solutions. For applications with demand for exact results we introduce a new Archive-metric which captures the amount of work needed to complete the join in case the streams are archived for later processing.

...read moreread less

310 citations

Book Chapter•DOI•

Operator scheduling in a data stream manager

[...]

Don Carney¹, Uğur Çetintemel¹, Alexander Rasin¹, Stan Zdonik¹, Mitch Cherniack², Michael Stonebraker³ - Show less +2 more•Institutions (3)

Brown University¹, Brandeis University², Massachusetts Institute of Technology³

09 Sep 2003

TL;DR: It is argued that a fine-grained scheduling approach in combination with various scheduling techniques (such as batching of operators and tuples) can significantly improve system efficiency by reducing various system overheads.

...read moreread less

Abstract: Many stream-based applications have sophisticated data processing requirements and real-time performance expectations that need to be met under high-volume, time-varying data streams. In order to address these challenges, we propose novel operator scheduling approaches that specify (1) which operators to schedule (2) in which order to schedule the operators, and (3) how many tuples to process at each execution step. We study our approaches in the context of the Aurora data stream manager. We argue that a fine-grained scheduling approach in combination with various scheduling techniques (such as batching of operators and tuples) can significantly improve system efficiency by reducing various system overheads. We also discuss application-aware extensions that make scheduling decisions according to per-application Quality of Service (QoS) specifications. Finally, we present prototype-based experimental results that characterize the efficiency and effectiveness of our approaches under various stream workloads and processing scenarios.

...read moreread less

299 citations

Proceedings Article•DOI•

Aurora: a data stream management system

[...]

Daniel J. Abadi¹, Don Carney², Uğur Çetintemel², Mitch Cherniack¹, Christian Convey², C. Erwin², E.F. Galvez¹, M. Hatoun², Anurag S. Maskey¹, Alexander Rasin², A. Singer², Michael Stonebraker³, Nesime Tatbul², Ying Xing², R. Yan², Stanley B. Zdonik² - Show less +12 more•Institutions (3)

Brandeis University¹, Brown University², Massachusetts Institute of Technology³

09 Jun 2003

TL;DR: This work proposes to demonstrate the Aurora system with its development environment and runtime system, with several example monitoring applications developed in consultation with defense, financial, and natural science communities, and shows the effect of various system alternatives on various workloads.

...read moreread less

Abstract: The Aurora system [1] is an experimental data stream management system with a fully functional prototype. It includes both a graphical development environment, and a runtime system. We propose to demonstrate the Aurora system with its development environment and runtime system, with several example monitoring applications developed in consultation with defense, financial, and natural science communities. We will also demonstrate the effect of various system alternatives on various workloads. For example, we will show how different scheduling algorithms affect tuple latency and internal queue lengths. We will use some of our visualization tools to accomplish this. Data Stream Management Aurora is a data stream management system for monitoring applications. Streams are continuous data feeds from such sources as sensors, satellites and stock feeds. Monitoring applications track the data from numerous streams, filtering them for signs of abnormal activity and processing them for purposes of aggregation, reduction and correlation. The management requirements for monitoring applications differ profoundly from those satisfied by a traditional DBMS: o A traditional DBMS assumes a passive model where most data processing results from humans issuing transactions and queries. Data stream management requires a more active approach, monitoring data feeds from unpredictable external sources (e.g., sensors) and alerting humans when abnormal activity is detected. o A traditional DBMS manages data that is currently in its tables. Data stream management often requires processing data that is bounded by some finite window of values, and not over an unbounded past. o A traditional DBMS provides exact answers to exact queries, and is blind to real-time deadlines. Data stream management often must respond to real-time deadlines (e.g., military applications monitoring positions of enemy platforms) and therefore must often provide reasonable approximations to queries. o A traditional query processor optimizes all queries in the same way (typically focusing on response time). A stream data manager benefits from application specific optimization criteria (QoS). o A traditional DBMS assumes pull-based queries to be the norm. Push-based data processing is the norm for a data stream management system. A Brief Summary of Aurora Aurora has been designed to deal with very large numbers of data streams. Users build queries out of a small set of operators (a.k.a. boxes). The current implementation provides a user interface for tapping into pre-existing inputs and network flows and for wiring boxes together to produces answers at the outputs. While it is certainly possible to accept input as declarative queries, we feel that for a very large number of such queries, the process of common sub-expression elimination is too difficult. An example of an Aurora network is given in Screen Shot 1. A simple stream is a potentially infinite sequence of tuples that all have the same stream ID. An arc carries multiple simple streams. This is important so that simple streams can be added and deleted from the system without having to modify the basic network. A query, then, is a sub-network that ends at a single output and includes an arbitrary number of inputs. Boxes can connect to multiple downstream boxes. All such path splits carry identical tuples. Multiple streams can be merged since some box types accept more than one input (e.g., Join, Union). We do not allow any cycles in an operator network. Each output is supplied with a Quality of Service (QoS) specification. Currently, QoS is captured by three functions (1) a latency graph, (2) a value-based graph, and (3) a loss-tolerance graph. The latency graph indicates how utility drops as an answer is delayed. The value-based graph shows which values of the output space are most important. The loss-tolerance graph is a simple way to describe how averse the application is to approximate answers. Tuples arrive at the input and are queued for processing. A scheduler selects a box with waiting tuples and executes that box on one or more of the input tuples. The output tuples of a box are queued at the input of the next box in sequence. In this way, tuples make their way from the inputs to the outputs. If the system is overloaded, QoS is adversely affected. In this case, we invoke a load shedder to strategically eliminate Aurora supports persistent storage in two different ways. First, when box queues consume more storage than available RAM, the system will spill tuples that are less likely to be needed soon to secondary storage. Second, ad hoc queries can be connected to (and disconnected from) any arc for which a connection point has been defined. A connection point stores a historical portion of a stream that has flowed on the arc. For example, one could define a connection point as the last hour’s worth of data that has been seen on a given arc. Any ad hoc query that connects to a connection point has access to the full stored history as well as any additional data that flows past while the query is connected.

...read moreread less

293 citations

Journal Article•DOI•

Watermarking relational data: framework, algorithms and analysis

[...]

Rakesh Agrawal¹, Peter J. Haas¹, Jerry Kiernan¹•Institutions (1)

IBM¹

01 Aug 2003

TL;DR: This paper presents an effective watermarking technique geared for relational data that is robust against various forms of malicious attacks as well as benign updates to the data and performs well enough to be used in real-world applications.

...read moreread less

Abstract: .We enunciate the need for watermarking database relations to deter data piracy, identify the characteristics of relational data that pose unique challenges for watermarking, and delineate desirable properties of a watermarking system for relational data. We then present an effective watermarking technique geared for relational data. This technique ensures that some bit positions of some of the attributes of some of the tuples contain specific values. The specific bit locations and values are algorithmically determined under the control of a secret key known only to the owner of the data. This bit pattern constitutes the watermark. Only if one has access to the secret key can the watermark be detected with high probability. Detecting the watermark requires access neither to the original data nor the watermark, and the watermark can be easily and efficiently maintained in the presence of insertions, updates, and deletions. Our analysis shows that the proposed technique is robust against various forms of malicious attacks as well as benign updates to the data. Using an implementation running on DB2, we also show that the algorithms perform well enough to be used in real-world applications.

...read moreread less

258 citations

Book Chapter•DOI•

Staircase join: teach a relational DBMS to watch its (axis) steps

[...]

Torsten Grust¹, Maurice van Keulen², Jens Teubner¹•Institutions (2)

University of Konstanz¹, University of Twente²

09 Sep 2003

TL;DR: This text proposes a local change to the database kernel, the staircase join, which encapsulates the necessary tree knowledge needed to improve XPath performance and reports on quite promising experiments with a staircase join enhanced main-memory database kernel.

...read moreread less

Abstract: Relational query processors derive much of their effectiveness from the awareness of specific table properties like sort order, size, or absence of duplicate tuples. This text applies (and adapts) this successful principle to database-supported XML and XPath processing: the relational system is made tree aware, i.e., tree properties like subtree size, intersection of paths, inclusion or disjointness of subtrees are made explicit. We propose a local change to the database kernel, the staircase join, which encapsulates the necessary tree knowledge needed to improve XPath performance. Staircase join operates on an XML encoding which makes this knowledge available at the cost of simple integer operations (e.g., +, ≤ ). We finally report on quite promising experiments with a staircase join enhanced main-memory database kernel.

...read moreread less

197 citations

Journal Article•DOI•

A logical framework for querying and repairing inconsistent databases

[...]

Gianluigi Greco, Sergio Greco, Ester Zumpano

01 Nov 2003-IEEE Transactions on Knowledge and Data Engineering

TL;DR: In this paper, a general logic framework for computing repairs and consistent answers over inconsistent databases is proposed, where different types of rules defining general integrity constraints, repair constraints (i.e., conditions on the insertion or deletion of atoms), and prioritized constraints are considered.

...read moreread less

Abstract: In this paper, we address the problem of managing inconsistent databases, i.e., databases violating integrity constraints. We propose a general logic framework for computing repairs and consistent answers over inconsistent databases. A repair for a possibly inconsistent database is a minimal set of insert and delete operations which makes the database consistent, whereas a consistent answer is a set of tuples derived from the database, satisfying all integrity constraints. In our framework, different types of rules defining general integrity constraints, repair constraints (i.e., rules defining conditions on the insertion or deletion of atoms), and prioritized constraints (i.e., rules defining priorities among updates and repairs) are considered. We propose a technique based on the rewriting of constraints into (prioritized) extended disjunctive rules with two different forms of negation (negation as failure and classical negation). The disjunctive program can be used for two different purposes: to compute "repairs" for the database and produce consistent answers, i.e., a maximal set of atoms which do not violate the constraints. We show that our technique is sound, complete (each preferred stable model defines a repair and each repair is derived from a preferred stable model), and more general than techniques previously proposed.

...read moreread less

Journal Article•DOI•

Compiling embedded languages

[...]

Conal Elliott¹, Sigbjorn Finne, Oege de Moor²•Institutions (2)

Microsoft¹, University of Oxford²

01 May 2003-Journal of Functional Programming

TL;DR: An implemented technique for producing optimizing compilers forDSELs, based on Kamin's idea of DSELs for program generation, using a data type of syntax for basic types, a set of smart constructors that perform rewriting over those types, some code motion transformations, and a back-end code generator.

...read moreread less

Abstract: Functional languages are particularly well-suited to the interpretive implementations of Domain-Specific Embedded Languages (DSELs). We describe an implemented technique for producing optimizing compilers for DSELs, based on Kamin's idea of DSELs for program generation. The technique uses a data type of syntax for basic types, a set of smart constructors that perform rewriting over those types, some code motion transformations, and a back-end code generator. Domain-specific optimization results from chains of domain-independent rewrites on basic types. New DSELs are defined directly in terms of the basic syntactic types, plus host language functions and tuples. This definition style makes compilers easy to write and, in fact, almost identical to the simplest embedded interpreters. We illustrate this technique with a language Pan for the computationally intensive domain of image synthesis and manipulation.

...read moreread less

Patent•

Efficient fuzzy match for evaluating data records

[...]

Surajit Chaudhuri¹, Kris Ganjam¹, Venkatesh Ganti¹, Rajeev Motwani¹•Institutions (1)

Microsoft¹

20 Jun 2003

TL;DR: In this article, a disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process is proposed.

...read moreread less

Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.

...read moreread less

Proceedings Article•DOI•

Efficient maintenance of materialized top-k views

[...]

Ke Yi¹, Hai Yu¹, Jun Yang¹, Gangqiang Xia¹, Yuguo Chen¹ - Show less +1 more•Institutions (1)

Duke University¹

05 Mar 2003

TL;DR: This work proposes an algorithm that reduces the frequency of refills by maintaining a top-k' view instead of aTop-k view, where k' changes at runtime between k and some k/sub max//spl ges/k, and shows that in most practical cases, the algorithm can reduce the expected amortized cost of refill queries to O(1) while still keeping the view small.

...read moreread less

Abstract: We tackle the problem of maintaining materialized top-k views. Top-k queries, including MIN and MAX as important special cases, occur frequently in common database workloads. A top-k view can be materialized to improve query performance, but in general it is not self-maintainable unless it contains all tuples in the base table. Deletions and updates on the base table may cause tuples to leave the top-k view, resulting in expensive queries over the base table to "refill" the view. We propose an algorithm that reduces the frequency of refills by maintaining a top-k' view instead of a top-k view, where k' changes at runtime between k and some k/sub max//spl ges/k. We show that in most practical cases, our algorithm can reduce the expected amortized cost of refill queries to O(1) while still keeping the view small. The optimal value of k/sub max/ depends on the update pattern and the costs of querying the base table and updating the view. Compared with the simple approach of maintaining either the top-k view itself or a copy of the base table, our algorithm can provide orders-of-magnitude improvements in performance with appropriate k/sub max/ values. We show how to choose k/sub max/ dynamically to adapt to the actual system workload and performance at runtime, without requiring accurate prior knowledge.

...read moreread less

Book Chapter•DOI•

Condensed Representation of Database Repairs for Consistent Query Answering

[...]

Jef Wijsen¹•Institutions (1)

University of Mons¹

08 Jan 2003

TL;DR: The problem of query answering in the presence of inconsistency relative to this refined repair notion is solved and there exists a condensed representation of all repairs that permits computing trustable query answers.

...read moreread less

Abstract: Repairing a database means bringing the database in accordance with a given set of integrity constraints by applying modifications that are as small as possible. In the seminal work of Arenas et al. on query answering in the presence of inconsistency, the possible modifications considered are deletions and insertions of tuples. Unlike earlier work, we also allow tuple updates as a repair primitive. Update-based repairing is advantageous, because it allows rectifying an error within a tuple without deleting the tuple, thereby preserving other consistent values in the tuple. At the center of the paper is the problem of query answering in the presence of inconsistency relative to this refined repair notion. Given a query, a trustable answer is obtained by intersecting the query answers on all repaired versions of the database. The problem arising is that, in general, a database can be repaired in infinitely many ways. A positive result is that for conjunctive queries and full dependencies, there exists a condensed representation of all repairs that permits computing trustable query answers.

...read moreread less

Proceedings Article•DOI•

Ranked join indices

[...]

Panayiotis Tsaparas¹, Themis Palpanas¹, Yannis Kotidis², Nick Koudas², Divesh Srivastava² - Show less +1 more•Institutions (2)

University of Toronto¹, AT&T Labs²

05 Mar 2003

TL;DR: This work proposes a novel technique, which it refers to as ranked join index, to efficiently answer top-k join queries for arbitrary, user specified, preferences and a large class of scoring functions, which requires small space and provides guarantees for its performance.

...read moreread less

Abstract: A plethora of data sources contain data entities that could be ordered according to a variety of attributes associated with the entities. Such orderings result effectively in a ranking of the entities according to the values in the attribute domain. Commonly, users correlate such sources for query processing purposes through join operations. In query processing, it is desirable to incorporate user preferences towards specific attributes or their values. A way to incorporate such preferences is by utilizing scoring functions that combine user preferences and attribute values and return a numerical score for each tuple in the join result. Then, a target query, which we refer to as top-k join query, seeks to identify the k tuples in the join result with the highest scores. We propose a novel technique, which we refer to as ranked join index, to efficiently answer top-k join queries for arbitrary, user specified, preferences and a large class of scoring functions. Our rank join index requires small space (compared to the entire join result) and provides guarantees for its performance. Moreover, our proposal provides a graceful tradeoff between its space requirements and worst case search performance. We supplement our analytical results with a thorough experimental evaluation using a variety of real and synthetic data sets, demonstrating that, in comparison to other viable approaches, our technique offers significant performance benefits.

...read moreread less

Book Chapter•DOI•

The Klaim project: Theory and practice

[...]

Lorenzo Bettini, Viviana Bono, Rocco De Nicola, Gian Luigi Ferrari, Daniele Gorla, Michele Loreti, Eugenio Moggi, Rosario Pugliese, Emilio Tuosto, Betti Venneri - Show less +6 more

09 Feb 2003-Lecture Notes in Computer Science

TL;DR: A modal logic is presented that permits reasoning about behavioural properties of systems and various type systems that help in controlling agents movements and actions in Klaim.

...read moreread less

Abstract: Klaim (Kernel Language for Agents Interaction and Mobility) is an experimental language specifically designed to program distributed systems consisting of several mobile components that interact through multiple distributed tuple spaces. Klaim primitives allow programmers to distribute and retrieve data and processes to and from the nodes of a net. Moreover, localities are first-class citizens that can be dynamically created and communicated over the network. Components, both stationary and mobile, can explicitly refer and control the spatial structures of the network. This paper reports the experiences in the design and development of Klaim. Its main purpose is to outline the theoretical foundations of the main features of Klaim and its programming model. We also present a modal logic that permits reasoning about behavioural properties of systems and various type systems that help in controlling agents movements and actions. Extensions of the language in the direction of object oriented programming are also discussed together with the description of the implementation efforts which have lead to the current prototypes.

...read moreread less

Proceedings Article•DOI•

Toward a foundational typed assembly language

[...]

Karl Crary¹•Institutions (1)

Carnegie Mellon University¹

15 Jan 2003

TL;DR: TALT is the first formalized typed assembly language to provide any of these features, including heterogeneous tuples, disjoint sums, and a general account of addressing modes, which are shown by machine-checkable proofs.

...read moreread less

Abstract: We present the design of a typed assembly language called TALT that supports heterogeneous tuples, disjoint sums, and a general account of addressing modes. TALT also implements the von Neumann model in which programs are stored in memory, and supports relative addressing. Type safety for execution and for garbage collection are shown by machine-checkable proofs. TALT is the first formalized typed assembly language to provide any of these features.

...read moreread less

Proceedings Article•DOI•

Tuples on the air: a middleware for context-aware computing in dynamic networks

[...]

Marco Mamei, Franco Zambonelli, Letizia Leonardi

19 May 2003

TL;DR: TOTA ("Tuples On The Air"), a novel middleware for supporting adaptive context-aware application in dynamic network scenarios that propagates tuples across a network on the basis of application-specific patterns and adaptively re-shapes the resulting distributed structures accordingly to changes in the network topology.

...read moreread less

Abstract: We present TOTA ("Tuples On The Air"), a novel middleware for supporting adaptive context-aware application in dynamic network scenarios. The key idea in TOTA is to rely on spatially distributed tuples for both representing contextual information and supporting uncoupled and adaptive interactions between application components. The middleware propagates tuples across a network on the basis of application-specific patterns and adaptively re-shapes the resulting distributed structures accordingly to changes in the network topology. Application components can locally "sense" these structures and exploit them to acquire contextual information and carry on complex coordination activities in an adaptive way. Several examples show the effectiveness of the TOTA approach.

...read moreread less

Book Chapter•DOI•

Temporal slicing in the evaluation of XML queries

[...]

Dengfeng Gao¹, Richard T. Snodgrass¹•Institutions (1)

University of Arizona¹

09 Sep 2003

TL;DR: A temporal XML query language, τXQuery, is presented, in which valid time support is added to XQuery by minimally extending the syntax and semantics of X query by adopting a stratum approach which maps a τX query to a conventional XQuery.

...read moreread less

Abstract: As with relational data, XML data changes over time with the creation, modification, and deletion of XML documents. Expressing queries on time-varying (relational or XML) data is more difficult than writing queries on nontemporal data. In this paper, we present a temporal XML query language, τXQuery, in which we add valid time support to XQuery by minimally extending the syntax and semantics of XQuery. We adopt a stratum approach which maps a τXQuery query to a conventional XQuery. The paper focuses on how to perform this mapping, in particular, on mapping sequenced queries, which are by far the most challenging. The critical issue of supporting sequenced queries (in any query language) is time-slicing the input data while retaining period timestamping. Timestamps are distributed throughout an XML document, rather than uniformly in tuples, complicating the temporal slicing while also providing opportunities for optimization. We propose four optimizations of our initial maximally-fragmented time-slicing approach: selected node slicing, copy-based per-expression slicing, in-place per-expression slicing, and idiomatic slicing, each of which reduces the number of constant periods over which the query is evaluated. While performance tradeoffs clearly depend on the underlying XQuery engine, we argue that there are queries that favor each of the five approaches.

...read moreread less

Journal Article•DOI•

Computing complete answers to queries in the presence of limited access patterns

[...]

Chen Li¹•Institutions (1)

University of California, Irvine¹

01 Oct 2003

TL;DR: This article study the problem of computing the complete answer to a query, i.e., the answer that could be computed if all the tuples could be retrieved, and proposes a decision tree for guiding the process to compute thecomplete answer.

...read moreread less

Abstract: .In data applications such as information integration, there can be limited access patterns to relations, i.e., binding patterns require values to be specified for certain attributes in order to retrieve data from a relation. As a consequence, we cannot retrieve all tuples from these relations. In this article we study the problem of computing the complete answer to a query, i.e., the answer that could be computed if all the tuples could be retrieved. A query is stable if for any instance of the relations in the query, its complete answer can be computed using the access patterns permitted by the relations. We study the problem of testing stability of various classes of queries, including conjunctive queries, unions of conjunctive queries, and conjunctive queries with arithmetic comparisons. We give algorithms and complexity results for these classes of queries. We show that stability of datalog programs is undecidable, and give a sufficient condition for stability of datalog queries. Finally, we study data-dependent computability of the complete answer to a nonstable query, and propose a decision tree for guiding the process to compute the complete answer.

...read moreread less

Proceedings Article•DOI•

Constructing a virtual primary key for fingerprinting relational data

[...]

Yingjiu Li¹, Vipin Swarup², Sushil Jajodia¹•Institutions (2)

George Mason University¹, Mitre Corporation²

27 Oct 2003

TL;DR: A new fingerprinting scheme that does not depend on a primary key attribute is proposed that constructs virtual primary keys from the most significant bits of some of each tuple's attributes.

...read moreread less

Abstract: Agrawal and Kiernan's watermarking technique for database relations [1] and Li et al's fingerprinting extension [6] both depend critically on primary key attributes. Hence, those techniques cannot embed marks in database relations without primary key attributes. Further, the techniques are vulnerable to simple attacks that alter or delete the primary key attribute.This paper proposes a new fingerprinting scheme that does not depend on a primary key attribute. The scheme constructs virtual primary keys from the most significant bits of some of each tuple's attributes. The actual attributes that are used to construct then virtual primary key differ from tuple to tuple. Attribute selection is based on a secret key that is known to the merchant only. Further, the selection does not depend on an apriori ordering over the attributes, or on knowledge of the original relation or fingerprint codeword.The virtual primary keys are then used in fingerprinting as in previous work [6]. Rigorous analysis shows that, with high probability, only embedded fingerprints can be detected and embedded fingerprints cannot be modified or erased by a variety of attacks. Attacks include adding, deleting, shuffling, or modifying tuples or attributes (including a primary key attribute if one exists), guessing secret keys, and colluding with other recipients of a relation.

...read moreread less

Book Chapter•DOI•

Resource access and mobility control with dynamic privileges acquisition

[...]

Daniele Gorla¹, Rosario Pugliese¹•Institutions (1)

University of Florence¹

30 Jun 2003

TL;DR: Klaim is a process language that permits programming distributed systems made up of several mobile components interacting through multiple distributed tuple spaces to guarantee absence of run-time errors due to lack of privileges and state two type soundness results: one involves whole nets, the other is relative to subnets of larger nets.

...read moreread less

Abstract: µKlaim is a process language that permits programming distributed systems made up of several mobile components interacting through multiple distributed tuple spaces. We present the language and a type system for controlling the activities, e.g. access to resources and mobility, of the processes in a net. By dealing with privileges acquisition, the type system enables dynamic variations of security policies. We exploit a combination of static and dynamic type checking, and of inlined reference monitoring, to guarantee absence of run-time errors due to lack of privileges and state two type soundness results: one involves whole nets, the other is relative to subnets of larger nets.

...read moreread less

Journal Article•DOI•

Corner sequence - a P-admissible floorplan representation with a worst case linear-time packing scheme

[...]

Jai-Ming Lin¹, Yao-Wen Chang², Shih-Ping Lin¹•Institutions (2)

National Chiao Tung University¹, National Taiwan University²

01 Aug 2003-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A P-admissible representation, called corner sequence (CS), for nonslicing floorplans, which induces a generic worst case linear-time packing scheme that can also be applied to other representations.

...read moreread less

Abstract: Floorplanning/placement allocates a set of modules into a chip so that no two modules overlap and some specified objective is optimized. To facilitate floorplanning/placement, we need to develop an efficient and effective representation to model the geometric relationship among modules. In this paper, we present a P-admissible representation, called corner sequence (CS), for nonslicing floorplans. CS consists of two tuples that denote the packing sequence of modules and the corners to which the modules are placed. CS is very effective and simple for implementation. Also, it supports incremental update during packing. In particular, it induces a generic worst case linear-time packing scheme that can also be applied to other representations. Experimental results show that CS achieves very promising results for a set of commonly used MCNC benchmark circuits.

...read moreread less

Posted Content•

Providing Diversity in K-Nearest Neighbor Query Results

[...]

Anoop Jain¹, Parag Sarda¹, Jayant R. Haritsa¹•Institutions (1)

Indian Institute of Science¹

15 Oct 2003-arXiv: Databases

TL;DR: This paper proposes a user-tunable definition of diversity, and presents an algorithm, called MOTLEY, for producing a diverse result set as per this definition, and shows that MOTLEY can produce diverse result sets by reading only a small fraction of the tuples in the database.

...read moreread less

Abstract: Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN) queries return the K closest answers according to given distance metric in the database with respect to Q. In this scenario, it is possible that a majority of the answers may be very similar to some other, especially when the data has clusters. For a variety of applications, such homogeneous result sets may not add value to the user. In this paper, we consider the problem of providing diversity in the results of KNN queries, that is, to produce the closest result set such that each answer is sufficiently different from the rest. We first propose a user-tunable definition of diversity, and then present an algorithm, called MOTLEY, for producing a diverse result set as per this definition. Through a detailed experimental evaluation on real and synthetic data, we show that MOTLEY can produce diverse result sets by reading only a small fraction of the tuples in the database. Further, it imposes no additional overhead on the evaluation of traditional KNN queries, thereby providing a seamless interface between diversity and distance.

...read moreread less

Book Chapter•DOI•

Distributed top-N query processing with possibly uncooperative local systems

[...]

Clement Yu, George Philip, Weiyi Meng¹•Institutions (1)

Binghamton University¹

09 Sep 2003

TL;DR: A method for constructing a facility where a user query is accepted at some site, suitable tuples from appropriate sites are retrieved and the results are merged and then presented to the user.

...read moreread less

Abstract: We consider the problem of processing top-N queries in a distributed environment with possibly uncooperative local database systems. For a given top-N query, the problem is to find the N tuples that satisfy the query the best but not necessarily completely in an efficient manner. Top-N queries are gaining popularity in relational databases and are expected to be very useful for e-commerce applications. Many companies provide the same type of goods and services to the public on the Web, and relational databases may be employed to manage the data. It is not feasible for a user to query a large number of databases. It is therefore desirable to provide a facility where a user query is accepted at some site, suitable tuples from appropriate sites are retrieved and the results are merged and then presented to the user. In this paper, we present a method for constructing the desired facility. Our method consists of two steps. The first step determines which databases are likely to contain the desired tuples for a given query so that the databases can be ranked based on their desirability with respect to the query. Four different techniques are introduced for this step with one requiring no cooperation from local systems. The second step determines how the ranked databases should be searched and what tuples from the searched databases should be returned. A new algorithm is proposed for this purpose. Experimental results are presented to compare different methods and very promising results are obtained using the method that requires no cooperation from local databases.

...read moreread less

Book Chapter•DOI•

Mr-SBC: A Multi-relational Naïve Bayes Classifier

[...]

Michelangelo Ceci, Annalisa Appice, Donato Malerba

22 Sep 2003

TL;DR: An extension of the naive Bayes classification method to the multi-relational setting, where training data are stored in several tables related by foreign key constraints and each example is represented by a set of related tuples rather than a single row as in the classical data mining setting is proposed.

...read moreread less

Abstract: In this paper we propose an extension of the naive Bayes classification method to the multi-relational setting. In this setting, training data are stored in several tables related by foreign key constraints and each example is represented by a set of related tuples rather than a single row as in the classical data mining setting. This work is characterized by three aspects. First, an integrated approach in the computation of the posterior probabilities for each class that make use of first order classification rules. Second, the applicability to both discrete and continuous attributes by means a supervised discretization. Third, the consideration of knowledge on the data model embedded in the database schema during the generation of classification rules. The proposed method has been implemented in the new system Mr-SBC, which is tightly integrated with a relational DBMS. Testing has been performed on two datasets and four benchmark tasks. Results on predictive accuracy and efficiency are in favour of Mr-SBC for the most complex tasks.

...read moreread less

Patent•

Byte stream organization with improved random and keyed access to information structures

[...]

Joshua S. Auerbach¹•Institutions (1)

IBM¹

13 Dec 2003

TL;DR: In this article, the concept of column-order has been extended to arbitrary nested tables by grouping all scalar information items that correspond to the same node in a tree representation of the schema.

...read moreread less

Abstract: The invention improves processing time when accessing information in a byte stream and avoids the step of deserializing unneeded portions of the byte stream when the byte stream encodes an information structure corresponding to a schema with arbitrarily nested lists and tuples. It facilitates efficient keyed access when lists of tuples represent tables with key columns by storing tables in nested column order, which extends the well-known concept of column-order so as to apply to arbitrarily nested tables. Using well-known offset calculation techniques within the nested lists that result from nested column order, the invention achieves greater efficiency by grouping together all scalar information items that correspond to the same node in a tree representation of the schema.

...read moreread less

Journal Article•DOI•

Symbolic user-defined periodicity in temporal relational databases

[...]

Paolo Terenziani¹•Institutions (1)

University of Eastern Piedmont¹

01 Feb 2003-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper proposes a high-level "symbolic" language for representing user-defined periodicity which seems to us more human-oriented than mathematical ones, and uses the domain of Gadia's temporal elements in order to define its properties and its extensional semantics.

...read moreread less

Abstract: Calendars and periodicity play a fundamental role in many applications. Recently, some commercial databases started to support user-defined periodicity in the queries in order to provide "a human-friendly way of handling time" (see, e.g., TimeSeries in Oracle 8). On the other hand, only few relational data models support user-defined periodicity in the data, mostly using "mathematical" expressions to represent periodicity. In this paper, we propose a high-level "symbolic" language for representing user-defined periodicity which seems to us more human-oriented than mathematical ones, and we use the domain of Gadia's temporal elements in order to define its properties and its extensional semantics. We then propose a temporal relational model which supports user-defined "symbolic" periodicity (e.g., to express "on the second Monday of each month") in the validity time of tuples and also copes with frame times (e.g., "from 1/1/98 to 28/2/98"). We define the temporal counterpart of the standard operators of the relational algebra, and we introduce new temporal operators and functions. We also prove that our temporal algebra is a consistent extension of the classical (atemporal) one. Moreover, we define both a fully symbolic evaluation method for the operators on the periodicities in the validity times of tuples, which is correct but not complete, and semisymbolic one, which is correct and complete, and study their computational complexity.

...read moreread less

Journal Article•

Property analysis of the aggregation operators for two-tuple linguistic information

[...]

Fan Zhi-ping¹•Institutions (1)

Northeastern University (China)¹

01 Jan 2003-Control and Decision

TL;DR: The results obtained enrich the existing analysis methods for aggregating two-tuple linguistic information and propose a new ordered weighted geometric (T-OWG) operator.

...read moreread less

Abstract: With respect to the problem of linguistic evaluation information discipline, the properties of the aggregation operators for two-tuple linguistic information are studied. The ordered weighted averaging (T-OWA) operator for two-tuple linguistic information is described. A new ordered weighted geometric (T-OWG) operator is proposed. Properties of T-OWA operator and T-OWG operator are also analyzed. The results obtained enrich the existing analysis methods for aggregating two-tuple linguistic information.

...read moreread less

Collapse