Showing papers on "Tuple published in 2001"

PDF

Open Access

Journal Article•DOI•

Approximate Query Processing Using Wavelets

[...]

Kaushik Chakrabarti¹, Minos Garofalakis², Rajeev Rastogi², Kyuseok Shim³•Institutions (3)

University of Illinois at Urbana–Champaign¹, Bell Labs², KAIST³

01 Sep 2001

TL;DR: The use of multi-dimensional wavelets are proposed as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications and a novel wavelet decomposition algorithm is proposed that can build wavelet-coefficient synopses of the data in an I/O-efficient manner.

...read moreread less

Abstract: Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times that scale linearly with the size of the data.

...read moreread less

556 citations

Proceedings Article•DOI•

LIME: a middleware for physical and logical mobility

[...]

Amy L. Murphy¹, Gian Pietro Picco², Gruia-Catalin Roman³•Institutions (3)

University of Rochester¹, Polytechnic University of Milan², Washington University in St. Louis³

16 Apr 2001

TL;DR: The model underlying LIME is illustrated, its current design and implementation is presented, and initial lessons learned in developing applications that involve physical mobility are discussed.

...read moreread less

Abstract: LIME is a middleware supporting the development of applications that exhibit physical mobility of hosts, logical mobility of agents, or both. LIME adapts a coordination perspective inspired by work on the Linda model. The context for computation, represented in Linda by a globally accessible, persistent tuple space, is represented in LIME by transient sharing of the tuple spaces carried by each individual mobile unit. Linda tuple spaces are also extended with a notion of location and with the ability to react to a given state. The hypothesis underlying our work is that the resulting model provides a minimalist set of abstractions that enable rapid and dependable development of mobile applications. In this paper, we illustrate the model underlying LIME, present its current design and implementation, and discuss initial lessons learned in developing applications that involve physical mobility.

...read moreread less

449 citations

Proceedings Article•DOI•

PREFER: a system for the efficient execution of multi-parametric ranked queries

[...]

Vagelis Hristidis¹, Nick Koudas², Yannis Papakonstantinou¹•Institutions (2)

University of California, San Diego¹, AT&T Labs²

01 May 2001

TL;DR: The results indicate that the proposed algorithms are superior in performance compared to other approaches, both in preprocessing (preparation of materialized views) as well as execution time.

...read moreread less

Abstract: Users often need to optimize the selection of objects by appropriately weighting the importance of multiple object attributes. Such optimization problems appear often in operations' research and applied mathematics as well as everyday life; e.g., a buyer may select a home as a weighted function of a number of attributes like its distance from office, its price, its area, etc.We capture such queries in our definition of preference queries that use a weight function over a relation's attributes to derive a score for each tuple. Database systems cannot efficiently produce the top results of a preference query because they need to evaluate the weight function over all tuples of the relation. PREFER answers preference queries efficiently by using materialized views that have been pre-processed and stored.We first show how the result of a preference query can be produced in a pipelined fashion using a materialized view. Then we show that excellent performance can be delivered given a reasonable number of materialized views and we provide an algorithm that selects a number of views to precompute and materialize given space constraints.We have implemented the algorithms proposed in this paper in a prototype system called PREFER, which operates on top of a commercial database management system. We present the results of a performance comparison, comparing our algorithms with prior approaches using synthetic datasets. Our results indicate that the proposed algorithms are superior in performance compared to other approaches, both in preprocessing (preparation of materialized views) as well as execution time.

...read moreread less

313 citations

Journal Article•DOI•

From tuple spaces to tuple centres

[...]

Andrea Omicini¹, Enrico Denti¹•Institutions (1)

University of Bologna¹

01 Nov 2001-Science of Computer Programming

TL;DR: It is shown how adopting a tuple centre for the coordination of a multiagent system can benefit both the system design and the overall system performance.

...read moreread less

228 citations

Proceedings Article•

Supporting Incremental Join Queries on Ranked Inputs

[...]

Apostol Natsev, Yuan-Chi Chang, John R. Smith, Chung-Sheng Li, Jeffrey Scott Vitter - Show less +1 more

11 Sep 2001

TL;DR: This paper investigates the problem of incremental joins of multiple ranked data sets when the join condition is a list of arbitrary user-defined predicates on the input tuples and proposes an algorithm that enables querying of ordered data sets by imposing arbitrary userdefined join predicates.

...read moreread less

Abstract: This paper investigates the problem of incremental joins of multiple ranked data sets when the join condition is a list of arbitrary user-defined predicates on the input tuples. This problem arises in many important applications dealing with ordered inputs and multiple ranked data sets, and requiring the top solutions. We use multimedia applications as the motivating examples but the problem is equally applicable to traditional database applications involving optimal resource allocation, scheduling, decision making, ranking, etc. We propose an algorithm that enables querying of ordered data sets by imposing arbitrary userdefined join predicates. The basic version of the algorithm does not use any random access but a variation can exploit available indexes for efficient random access based on the join predicates. A special case includes the join scenario considered by Fagin [1] for joins based on identical keys, and in that case, our algorithms perform as efficiently as Fagin’s. Our main contribution, however, is the generalization to join scenarios that were previously unsupported, including cases where random access in the algorithm is not possible due to lack of unique keys. In addition, can support multiple join levels, or nested join hierarchies, which are the norm for modeling multimedia data. We also give -approximation versions of both of the above algorithms. Finally, we give strong optimality results for some of the proposed algorithms, and we study their performance empirically.

...read moreread less

220 citations

Proceedings Article•

Storage and Querying of E-Commerce Data

[...]

Rakesh Agrawal¹, Amit Somani¹, Yirong Xu¹•Institutions (1)

IBM¹

11 Sep 2001

TL;DR: This work represents objects in a vertical format storing an object as a set of tuples, and creates a logical horizontal view of the vertical representation and transforms queries on this view to the vertical table.

...read moreread less

Abstract: New generation of e-commerce applications require data schemas that are constantly evolving and sparsely populated. The conventional horizontal row representation fails to meet these requirements. We represent objects in a vertical format storing an object as a set of tuples. Each tuple consists of an object identifier and attribute name-value pair. Schema evolution is now easy. However, writing queries against this format becomes cumbersome. We create a logical horizontal view of the vertical representation and transform queries on this view to the vertical table. We present alternative implementations and performance results that show the effectiveness of the vertical representation for sparse data. We also identify additional facilities needed in database systems to support these applications well.

...read moreread less

207 citations

Proceedings Article•DOI•

Clio: a semi-automatic tool for schema mapping

[...]

Mauricio A. Hernández¹, Renée J. Miller², Laura M. Haas¹•Institutions (2)

IBM¹, University of Toronto²

01 May 2001

TL;DR: Clio is demonstrated, a new semi-automated tool for creating schema mappings that employs a mapping-by-example paradigm that relies on the use of value correspondences describing how a value of a target attribute can be created from a set of values of source attributes.

...read moreread less

Abstract: We consider the integration requirements of modern data intensive applications including data warehousing, global information systems and electronic commerce. At the heart of these requirements lies the schema mapping problem in which a source (legacy) database must be mapped into a different, but xed, target schema. The goal of schema mapping is the discovery of a query or set of queries to map source databases into the new structure. We demonstrate Clio, a new semi-automated tool for creating schema mappings. Clio employs a mapping-by-example paradigm that relies on the use of value correspondences describing how a value of a target attribute can be created from a set of values of source attributes. A typical session with Clio starts with the user loading a source and a target schema into the system. These schemas are read from either an underlying Object-Relational database or from an XML le with an associated XML Schema. Users can then draw value correspondences mapping source attributes into target attributes. Clio's mapping engine incrementally produces the SQL queries that realize the mappings implied by the correspondences. Clio provides schema and data browsers and other feedback to allow users to understand the mapping produced. Entering and manipulating value correspondences can be done in two modes. In the Schema View mode, users see a representation of the source and target schema and create value correspondences by selecting schema objects from the source and mapping them to a target attribute. The alternative Data View mode o ers a WYSIWYG interface for the mapping process that displays example data for both the source and target tables [3]. Users may add and delete value correspondences from this view and immediately see the changes re ected in the resulting target tuples. Also, the Data View mode helps users navigate through alternative mappings, understanding the often subtle di erences between them. For example, in some cases, changing a join from an inner join to an outer join may dramatically change the resulting table. In other cases, the same change may have no e ect due to constraints that hold on the source

...read moreread less

160 citations

Proceedings Article•DOI•

Efficient evaluation of XML middle-ware queries

[...]

Mary Fernández¹, Atsuyuki Morishima², Dan Suciu³•Institutions (3)

AT&T Labs¹, University of Tsukuba², University of Washington³

01 May 2001

TL;DR: This work addresses the problem of efficiently constructing materialized XML views of relational databases by focusing on how to best choose the SQL queries, without having control over the target RDBMS.

...read moreread less

Abstract: We address the problem of efficiently constructing materialized XML views of relational databases. In our setting, the XML view is specified by a query in the declarative query language of a middle-ware system, called SilkRoute. The middle-ware system evaluates a query by sending one or more SQL queries to the target relational database, integrating the resulting tuple streams, and adding the XML tags. We focus on how to best choose the SQL queries, without having control over the target RDBMS.

...read moreread less

132 citations

Proceedings Article•

Intelligent Rollups in Multidimensional OLAP Data

[...]

Gayatri Sathe¹, Sunita Sarawagi•Institutions (1)

Indian Institute of Technology Bombay¹

11 Sep 2001

TL;DR: A new operator can automatically generalize from a specific problem case in detailed data and return the broadest context in which the problem occurs and a compact and easy-to-interpret summary of all possible maximal generalizations along various roll-up paths around the case is proposed.

...read moreread less

Abstract: In this paper we propose a new operator for advanced exploration of large multidimensional databases. The proposed operator can automatically generalize from a specific problem case in detailed data and return the broadest context in which the problem occurs. Such a functionality would be useful to an analyst who after observing a problem case, say a drop in sales for a product in a store, would like to find the exact scope of the problem. With existing tools he would have to manually search around the problem tuple trying to draw a pattern. This process is both tedious and imprecise. Our proposed operator can automate these manual steps and return in a single step a compact and easy-to-interpret summary of all possible maximal generalizations along various roll-up paths around the case. We present a fle xible cost-based framework that can generalize various kinds of behaviour (not simply drops) while requiring little additional customization from the user. We design an algorithm that can work efficiently on large multidimensional hierarchical data cubes so as to be usable in an interactive setting.

...read moreread less

119 citations

Journal Article•DOI•

Aggregation of imprecise and uncertain information in databases

[...]

Sally McClean¹, Bryan Scotney, Mary Shapcott•Institutions (1)

Ulster University¹

01 Nov 2001-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work considers the problem of aggregation using an imprecise probability data model that allows us to represent imprecision by partial probabilities and uncertainty using probability distributions to perform the operations necessary for knowledge discovery in databases.

...read moreread less

Abstract: Information stored in a database is often subject to uncertainty and imprecision. Probability theory provides a well-known and well understood way of representing uncertainty and may thus be used to provide a mechanism for storing uncertain information in a database. We consider the problem of aggregation using an imprecise probability data model that allows us to represent imprecision by partial probabilities and uncertainty using probability distributions. Most work to date has concentrated on providing functionality for extending the relational algebra with a view to executing traditional queries on uncertain or imprecise data. However, for imprecise and uncertain data, we often require aggregation operators that provide information on patterns in the data. Thus, while traditional query processing is tuple-driven, processing of uncertain data is often attribute-driven where we use aggregation operators to discover attribute properties. The aggregation operator that we define uses the Kullback-Leibler information divergence between the aggregated probability distribution and the individual tuple values to provide a probability distribution for the domain values of an attribute or group of attributes. The provision of such aggregation operators is a central requirement in furnishing a database with the capability to perform the operations necessary for knowledge discovery in databases.

...read moreread less

91 citations

Patent•

Parallel concatenated code with soft-in soft-out interactive turbo decoder

[...]

Kelly B. Cameron¹, Ba-Zhong Shen¹, Hau Thien Tran¹, Christopher R. Jones¹, Thomas A. Hughes¹ - Show less +1 more•Institutions (1)

Broadcom¹

12 Sep 2001

TL;DR: In parallel concatenated (Turbo) encoding and decoding, the input data tuples may be interleaved using a modulo scheme in which the interleaving is according to some method.

...read moreread less

Abstract: A method for parallel concatenated (Turbo) encoding and decoding. Turbo encoders receive a sequence of input data tuples and encode them. The input sequence may correspond to a sequence of an original data source, or to an already coded data sequence such as provided by a Reed-Solomon encoder. A turbo encoder generally comprises two or more encoders separated by one or more interleavers. The input data tuples may be interleaved using a modulo scheme in which the interleaving is according to some method (such as block or random interleaving) with the added stipulation that the input tuples may be interleaved only to interleaved positions having the same modulo-N (where N is an integer) as they have in the input data sequence. If all the input tuples are encoded by all encoders then output tuples can be chosen sequentially from the encoders and no tuples will be missed. If the input tuples comprise multiple bits, the bits may be interleaved independently to interleaved positions having the same modulo-N and the same bit position. This may improve the robustness of the code. A first encoder may have no interleaver or all encoders may have interleavers, whether the input tuple bits are interleaved independently or not. Modulo type interleaving also allows decoding in parallel.

...read moreread less

Book•

Implicit Parallel Programming in pH

[...]

Arvind, Rishiyur Nikhil

30 May 2001

TL;DR: This book discusses parallel programming from Sequential to Implicit Parallel Programming to Sequencing, Input/Output, and Side Effects, and Types and Type Checking.

...read moreread less

Abstract: Chapter 1 From Sequential to Implicit Parallel Programming Chapter 2 Functions and Reduction Chapter 3 Types and Type Checking Chapter 4 Rewrite Rules, Reduction Strategies, and Parallelism Chapter 5 Tuples and Algebraic Product Types Chapter 6 Lists and Algebraic Sum Types Chapter 7 Arrays: Fast Indexed Data Structures Chapter 8 Sequencing, Input/Output, and Side Effects Chapter 9 I-structures Chapter 10 M-structures: Mutable Synchronized State Chapter 11 Conclusion Appendix A An Introduction to the for pH

...read moreread less

Book Chapter•DOI•

Tuple-based technologies for coordination

[...]

Davide Rossi¹, Giacomo Cabri, Enrico Denti¹•Institutions (1)

University of Bologna¹

01 Mar 2001

TL;DR: By tuple-based technologies the authors refer to any coordination system that uses associative access to shared dataspaces for communication / synchronization purposes.

...read moreread less

Abstract: By tuple-based technologies we refer to any coordination system that uses associative access to shared dataspaces for communication / synchronization purposes.

...read moreread less

Proceedings Article•DOI•

A packet classification and filter management system

[...]

V. Srinivasan¹•Institutions (1)

Microsoft¹

22 Apr 2001

TL;DR: A new filter matching scheme called entry-pruned tuple search is presented and its advantages over previously presented algorithms are discussed, and an incremental update algorithm based on maintaining an event list that can be applied to many of the previously presented filter matching schemes which did not support incremental updates are presented.

...read moreread less

Abstract: Packet classification and fast filter matching have been an important field of research. Several algorithms have been proposed for fast packet classification. We first present a new filter matching scheme called entry-pruned tuple search and discuss its advantages over previously presented algorithms. We then show how this algorithm blends very well with an earlier packet classification algorithm that uses markers and precomputation, to give a blended entry-pruned tuple search with markers and precomputation (EPTSMP). We present performance measurements using several real-life filter databases. For a large real-life database of 1777 filters, our preprocessing times were close to 9 seconds; a lookup takes about 20 memory accesses and the data structure takes about 500 K bytes of memory. Then, we present scenarios that will require various programs/modules to automatically generate and add filters to a filter processing engine. We then consider issues in enabling this. We need policies that govern what filters can be added by different modules. We present our filter policy management architecture. We then show how to support fast filter updates. We present an incremental update algorithm based on maintaining an event list that can be applied to many of the previously presented filter matching schemes which did not support incremental updates. We then describe the event list based incremental update algorithm as it applies to EPTSMP. To stress the generality of the approach, we also describe how our update technique can be used with the packet classification technique based on crossproducing. We conclude with an outline of a hardware implementation of EPTSMP that can handle OC192 rates with 40 byte minimum packet lengths.

...read moreread less

Patent•

Enabling intra-partition parallelism for partition-based operations

[...]

Thierry Cruanes¹, Benoit Dageville¹, Patrick A. Amor¹•Institutions (1)

Business International Corporation¹

07 May 2001

TL;DR: In this paper, a technique for increasing the degree of parallelism without incurring overhead costs associated with inter-nodal communication for performing parallel operations is presented. But it does not address the overhead of inter-node communication.

...read moreread less

Abstract: Techniques are provided for increasing the degree of parallelism without incurring overhead costs associated with inter-nodal communication for performing parallel operations. One aspect of the invention is to distribute-phase partition-pairs of a parallel partition-wise operation on a pair of objects among the nodes of a database system. The -phase partition-pairs that are distributed to each node are further partitioned to form a new set of-phase partition-pairs. One -phase partition-pair from the set of new-phase partition-pairs is assigned to each slave process that is on a given node. In addition, a target object may be partitioned by applying an appropriate hash function to the tuples of the target object. The parallel operation is performed by broadcasting each tuple from a source table only to the group of slave processes that is working on the static partition to which the tuple is mapped.

...read moreread less

Proceedings Article•DOI•

Robust space transformations for distance-based operations

[...]

Edwin M. Knorr¹, Raymond T. Ng¹, Ruben H. Zamar¹•Institutions (1)

University of British Columbia¹

26 Aug 2001

TL;DR: The fundamental question that this paper addresses is: "What then is an appropriate space?" and this paper proposes using a robust space transformation called the Donoho-Stahel estimator, which says that in spite of frequent updates, the estimator does not lose its usefulness, or require re-computation.

...read moreread less

Abstract: For many KDD operations, such as nearest neighbor search, distance-based clustering, and outlier detection, there is an underlying k-D data space in which each tuple/object is represented as a point in the space. In the presence of differing scales, variability, correlation, and/or outliers, we may get unintuitive results if an inappropriate space is used.The fundamental question that this paper addresses is: "What then is an appropriate space?" We propose using a robust space transformation called the Donoho-Stahel estimator. In the first half of the paper, we show the key properties of the estimator. Of particular importance to KDD applications involving databases is the stability property, which says that in spite of frequent updates, the estimator does not: (a) change much, (b) lose its usefulness, or (c) require re-computation. In the second half, we focus on the computation of the estimator for high-dimensional databases. We develop randomized algorithms and evaluate how well they perform empirically. The novel algorithm we develop called the Hybrid-random algorithm is, in most cases, at least an order of magnitude faster than the Fixed-angle and Subsampling algorithms.

...read moreread less

Proceedings Article•DOI•

Type-indexed rows

[...]

Mark Shields¹, Erik Meijer²•Institutions (2)

Oregon Health & Science University¹, Utrecht University²

01 Jan 2001

TL;DR: This paper presents a novel variation, type-indexed rows, in which labels are discarded and elements are indexed by their type alone, and presents a type checking algorithm, and shows how λTIR may be implemented by a type-directed translation which replaces type- indexing by conventional natural-number indexing.

...read moreread less

Abstract: Record calculi use labels to distinguish between the elements of products and sums This paper presents a novel variation, type-indexed rows, in which labels are discarded and elements are indexed by their type alone The calculus, λTIR, can express tuples, recursive datatypes, monomorphic records, polymorphic extensible records, and closed-world style type-based overloading Our motivating application of λTIR, however, is to encode the "choice" types of XML, and the "unordered tuple" types of SGML Indeed, λTIR is the kernel of the language XMλ, a lazy functional language with direct support for XML types ("DTDs") and terms ("documents")The system is built from rows, equality constraints, insertion constraints and constrained, or qualified, parametric polymorphism The test for constraint satisfaction is complete, and for constraint entailment is only mildly incomplete We present a type checking algorithm, and show how λTIR may be implemented by a type-directed translation which replaces type-indexing by conventional natural-number indexing Though not presented in this paper, we have also developed a constraint simplification algorithm and type inference system

...read moreread less

Book Chapter•DOI•

On Answering Queries in the Presence of Limited Access Patterns

[...]

Chen Li¹, Edward Y. Chang²•Institutions (2)

Stanford University¹, University of California²

04 Jan 2001

TL;DR: This paper considers the following fundamental problem: can the authors compute the complete answer to a query by accessing the relations with legal patterns, and gives algorithms for solving the problem for various classes of queries, including conjunctive queries, unions of conjunction queries, and conj unctive queries with arithmetic comparisons.

...read moreread less

Abstract: . In information-integration systems, source relations often have limitations on access patterns to their data; i.e., when one must provide values for certain attributes of a relation in order to retrieve its tuples. In this paper we consider the following fundamental problem: can we compute the complete answer to a query by accessing the relations with legal patterns? The complete answer to a query is the answer that we could compute if we could retrieve all the tuples from the relations. We give algorithms for solving the problem for various classes of queries, including conjunctive queries, unions of conjunctive queries, and conjunctive queries with arithmetic comparisons. We prove the problem is undecidable for datalog queries. If the complete answer to a query cannot be computed, we often need to compute its maximal answer. The second problem we study is, given two conjunctive queries on relations with limited access patterns, how to test whether the maximal answer to the first query is contained in the maximal answer to the second one? We show this problem is decidable using the results of monadic programs.

...read moreread less

Book Chapter•DOI•

View-Based Query Answering and Query Containment over Semistructured Data

[...]

Diego Calvanese¹, Giuseppe De Giacomo¹, Maurizio Lenzerini¹, Moshe Y. Vardi²•Institutions (2)

Sapienza University of Rome¹, Rice University²

08 Sep 2001

TL;DR: A technique is presented to obtain view-based query answering algorithms that compute the whole set of tuples in the certain answer, instead of requiring to check each tuple separately.

...read moreread less

Abstract: The basic querying mechanism over semistructured data, namely regular path queries, asks for all pairs of objects that are connected by a path conforming to a regular expression. We consider conjunctive two-way regular path queries (C2RPQc's), which extend regular path queries with two features. First, they add the inverse operator, which allows for expressing navigations in the database that traverse the edges both backward and forward. Second, they allow for using conjunctions of atoms, where each atom specifies that a regular path query with inverse holds between two terms, where each term is either a variable or a constant. For such queries we address the problem of view-based query answering, which amounts to computing the result of a query only on the basis of a set of views. More specifically, we present the following results: (1) We exhibit a mutual reduction between query containment and the recognition problem for view-based query answering for C2RPQc's, i.e., checking whether a given tuple is in the certain answer to a query. Based on such a result, we can show that the problem of view-based query answering for C2RPQc's is EXPSPACE-complete. (2) By exploiting techniques based on alternating two-way automata we show that for the restricted class of tree two-way regular path queries (in which the links between variables form a tree), query containment and view-based query answering are, rather surprisingly, in PSPACE (and hence, PSPACE-complete). (3) We present a technique to obtain view-based query answering algorithms that compute the whole set of tuples in the certain answer, instead of requiring to check each tuple separately. The technique is parametric wrt the query language, and can be applied both to C2RPQc's and to tree-queries.

...read moreread less

Journal Article•DOI•

Optimizing large join queries using a graph-based approach

[...]

Chiang Lee¹, Chi-Sheng Shih, Yaw-Huei Chen•Institutions (1)

National Cheng Kung University¹

01 Mar 2001-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A graph-theoretic approach presented in the paper provides a sound mathematical basis for representing a query and searching for an execution plan and devise an algorithm that finds a near optimal execution plan using only polynomial time.

...read moreread less

Abstract: Although many query tree optimization strategies have been proposed in the literature, there still is a lack of a formal and complete representation of all possible permutations of query operations (i.e., execution plans) in a uniform manner. A graph-theoretic approach presented in the paper provides a sound mathematical basis for representing a query and searching for an execution plan. In this graph model, a node represents an operation and a directed edge between two nodes indicates the older of executing these two operations in an execution plan. Each node is associated with a weight and so is an edge. The weight is an expression containing optimization required parameters, such as relation size, tuple size, join selectivity factors. All possible execution plans are representable in this graph and each spanning tree of the graph becomes an execution plan. It is a general model which can be used in the optimizer of a DBMS for internal query representation. On the basis of this model, we devise an algorithm that finds a near optimal execution plan using only polynomial time. The algorithm is compared with a few other popular optimization methods. Experiments show that the proposed algorithm is superior to the others under most circumstances.

...read moreread less

Patent•

Estimation and use of access plan statistics

[...]

Abdo Esmail Abdo¹, Larry Wayne Loen¹•Institutions (1)

IBM¹

13 Dec 2001

TL;DR: In this paper, a prior statistic generated for a prior different selection criterion on the same one or more attributes of the relation, may be revalidated for use in processing the query, based upon a measure of the entropy of the attributes.

...read moreread less

Abstract: In processing a query including a selection criterion on one or more attributes of a relation, a prior statistic generated for a prior different selection criterion on the same one or more attributes of the relation, may be revalidated for use in processing the query, based upon a measure of the entropy of the one or more attributes of the relation. In this way, the re-validation of statistics may be performed more efficiently. Furthermore, attribute groups of a relation for which multi-dimensional indexes are to be formed, are identified by evaluating the correlation of attribute values within tuples of the relation and determining that the correlation of attribute values within tuples of the relation exceeds a threshold.

...read moreread less

Journal Article•DOI•

Efficient processing of nested Fuzzy SQL queries in a fuzzy database

[...]

Qi Yang¹, Weining Zhang, Chengwen Liu², Jing Wu³, Clement Yu³, Hiroshi Nakajima⁴, Naphtali Rishe⁵ - Show less +3 more•Institutions (5)

University of Wisconsin-Madison¹, DePaul University², University of Illinois at Chicago³, Omron⁴, Florida International University⁵

01 Nov 2001-IEEE Transactions on Knowledge and Data Engineering

TL;DR: In this paper, an extended merge-join is used to evaluate the unnested fuzzy queries, which significantly improves the performance of evaluating nested fuzzy queries. But the results are limited to a subset of nested queries.

...read moreread less

Abstract: In a fuzzy relational database where a relation is a fuzzy set of tuples and ill-known data are represented by possibility distributions, nested fuzzy queries can be expressed in the Fuzzy SQL language. Although it provides a very convenient way for users to express complex queries, a nested fuzzy query may be very inefficient to process with the naive evaluation method based on its semantics. In conventional databases, nested queries are unnested to improve the efficiency of their evaluation. In this paper, we extend the unnesting techniques to process several types of nested fuzzy queries. An extended merge-join is used to evaluate the unnested fuzzy queries. As shown by both theoretical analysis and experimental results, the unnesting techniques with the extended merge-join significantly improve the performance of evaluating nested fuzzy queries.

...read moreread less

Book Chapter•DOI•

A General Scheme for Multiple Lower Bound Computation in Constraint Optimization

[...]

Rina Dechter¹, Kalev Kask¹, Javier Larrosa²•Institutions (2)

University of California, Irvine¹, Polytechnic University of Catalonia²

26 Nov 2001

TL;DR: Preliminary experiments on Max-CSP show that using MBTE(z) to guide dynamic variable and value orderings in branch and bound yields a dramatic reduction in the search space and, for some classes of problems, this reduction is highly cost-effective producing significant time savings and is competitive against specialized algorithms for Max- CSP.

...read moreread less

Abstract: Computing lower bounds to the best-cost extension of a tuple is an ubiquous task in constraint optimization A particular case of special interest is the computation of lower bounds to all singleton tuples, since it permits domain pruning in Branch and Bound algorithms In this paper we introduce MCTE(z), a general algorithm which allows the computation of lower bounds to arbitrary sets of tasks Its time and accuracy grows as a function of z allowing a controlled tradeoff between lower bound accuracy and time and space to fit available resources Subsequently, a specialization of MCTE(z) called MBTE(z) is tailored to computing lower bounds to singleton tuples Preliminary experiments on Max-CSP show that using MBTE(z) to guide dynamic variable and value orderings in branch and bound yields a dramatic reduction in the search space and, for some classes of problems, this reduction is highly cost-effective producing significant time savings and is competitive against specialized algorithms for Max-CSP

...read moreread less

Book Chapter•DOI•

The inference problem and updates in relational databases

[...]

Csilla Farkas¹, Tyrone S. Toland¹, Caroline M. Eastman¹•Institutions (1)

University of South Carolina¹

15 Jul 2001

TL;DR: This paper presents a mechanism, called Update Consolidator (UpCon) that propagates updates to the user's history file to ensure that no query is rejected based on outdated data, and proposes a Cardinality Inference Detection module, that generates all data that can be disclosed via cardinality based attacks.

...read moreread less

Abstract: In this paper, we extend the Disclosure Monitor (DiMon) security mechanism (Brodsky et al. [1]) to prevent illegal inferences via database constraints in dynamic databases. We study updates from two perspectives: 1) updates on tuples that were previously released to a user may cause that tuple to be "outdated", thus providing greater freedom for releasing new tuples; 2) observation of changes in released tuples may create cardinality based inferences, which are not indicated by database dependencies. We present a mechanism, called Update Consolidator (UpCon) that propagates updates to the user's history file to ensure that no query is rejected based on outdated data. We also propose a Cardinality Inference Detection (CID) module, that generates all data that can be disclosed via cardinality based attacks. We show that UpCon and CID, when integrated into the DiMon architecture, guarantee confidentiality (completeness property of the data-dependent disclosure inference algorithm) and maximal availability (soundness property of the data-dependent disclosure inference algorithm) even in the presence of updates.

...read moreread less

Product Recommendation Systems: A New Direction

[...]

Derek Bridge¹•Institutions (1)

University College Cork¹

01 Jan 2001

TL;DR: An ordered type is one that has a non-trivial partial order of its values that may be useful in product recommendation and a distinction is drawn at this point between ordered types and unordered types.

...read moreread less

Abstract: This paper is about content-based product recommender systems. In product recommendation, a customer is presented with a selection of products from a product catalogue. Content-based approaches (in contradistinction to, e.g., collaborative approaches) select products by matching product descriptions from the catalogue with descriptions of customer preferences and requirements. We will refer to each product description as a case, , and we will refer to the product catalogue as a case base, CB. We assume a set of attributes, , and, for each , a projection function, , which obtains a value for the attribute from the case. For example, price returns the value of case ’s price attribute. This formulation, using projection functions, has the advantage of being agnostic about the actual underlying representation of the cases. They might, for example, be stored as tuples in a relational database, objects in an object-oriented database, or XML documents; all of these can support projection functions. It also allows the possibility of what one might call virtual attributes, where the value returned is not directly stored but is, instead, computed or inferred from what is stored. This is useful, for example, when the case base stores only ‘technical’ data (e.g. a car’s fuel-tank capacity, fuel consumption and top speed) but product selection requires ‘lifestyle’ attributes (e.g. the sportiness of the car). The projection functions for the lifestyle attributes would infer their values from the technical data. The values returned by a projection function will be of some particular type. For example, for a holiday case base, transport might have type train plane car coach ; season might have type Jan Feb Dec ; price might have some suitable set of numbers as its type. To simplify this paper, we will draw a distinction at this point between ordered types and unordered types. We will say that an ordered type is one that has a non-trivial partial order of its values that may be useful in product recommendation. price is an example: since its type is numeric, the values are ordered by the usual ordering of the numbers ( ).

...read moreread less

Journal Article•DOI•

[...]

K. Selçuk Candan¹, Wen-Syan Li¹•Institutions (1)

NEC¹

01 Feb 2001-Knowledge and Information Systems

TL;DR: The essential multimedia retrieval semantics are described, these with the known approaches are compared, and a semantics which captures the retrieval requirements in multimedia databases is proposed.

...read moreread less

Abstract: A multimedia database query consists of a set of fuzzy and boolean (or crisp) predicates, constants, variables, and conjunction, disjunction, and negation operators. The fuzzy predicates are evaluated based on different media criteria, such as color, shape, layout, keyword. Since media-based evaluation yields similarity values, results to such a query is defined as an ordered set. Since many multimedia applications require partial matches, query results also include tuples which do not satisfy all predicates. Hence, any fuzzy semantics which extends the boolean semantics of conjunction in a straight forward manner may not be desirable for multimedia databases. In this paper, we focus on the problem of ‘given a multimedia query which consists of multiple fuzzy and crisp predicates, how to provide the user with a meaningful overall ranking.’ More specifically, we study the problem of merging similarity values in queries with multiple fuzzy predicates. We describe the essential multimedia retrieval semantics, compare these with the known approaches, and propose a semantics which captures the retrieval requirements in multimedia databases.

...read moreread less

Patent•

Incremental refresh of materialized views for many-to-many relationships

[...]

Nimar S. Arora¹•Institutions (1)

Business International Corporation¹

02 Oct 2001

TL;DR: In this paper, a framework for incrementally refreshing a materialized view is provided, based on a query that references a projected table and another set of base tables, where the query projects the columns of the projected table.

...read moreread less

Abstract: A framework for the incrementally refreshing a materialized view is provided. The materialized view is based on a query that references a projected table and another set of base tables. The query projects the columns of the projected table. To refresh the materialized view, a set of tuples is computed that identify rows to delete, insert, or otherwise modify in the materialized view in order to refresh it. The set of tuples is computed by computing a set of intersections, (1) one for the intersection between the query and the change log of the projected table, and (2) at least one other between the equijoin of the change log for one of the other base tables and the projected table. The query may define an equijoin between the projected table and at least one base table based on equijoin conditions that define a many-to-many relationship or a one-to-many relationship.

...read moreread less

Patent•

Method and apparatus for performing hash join

[...]

Gang Luo¹, Curt J. Ellmann¹, Jeffrey F. Naughton¹•Institutions (1)

NCR Corporation¹

26 Apr 2001

TL;DR: A parallel hash ripple join algorithm as mentioned in this paper partitions tuples of two relations for localized processing, and at each processing node, the tuples are further partitioned such that join operations may be performed as tuples were redistributed to each node during the partitioning.

...read moreread less

Abstract: A parallel hash ripple join algorithm partitions tuples of two relations for localized processing. The algorithm is non-blocking and may be performed in a parallel, multi-processor environment. At each processing node, the tuples are further partitioned such that join operations may be performed as tuples are redistributed to each node during the partitioning.

...read moreread less

Journal Article•DOI•

Discovering Maximal Generalized Decision Rules Through Horizontal and Vertical Data Reduction

[...]

Xiaohua Hu¹, Nick Cercone²•Institutions (2)

State Street Corporation¹, University of Waterloo²

01 Nov 2001

TL;DR: A method to learn maximal generalized decision rules from databases by integrating discretization, generalization and rough set feature selection, which can dramatically reduce the feature space and improve learning accuracy is presented.

...read moreread less

Abstract: We present a method to learn maximal generalized decision rules from databases by integrating discretization, generalization and rough set feature selection Our method reduces the data horizontally and vertically In the first phase, discretization and generalization are integrated and the numeric attributes are discretized into a few intervals The primitive values of symbolic attributes are replaced by high level concepts and some obvious superfluous or irrelevant symbolic attributes are also eliminated Horizontal reduction is accomplished by merging identical tuples after the substitution of an attribute value by its higher level value in a pre-defined concept hierarchy for symbolic attributes, or the discretization of continuous (or numeric) attributes This phase greatly decreases the number of tuples in the database In the second phase, a novel context-sensitive feature merit measure is used to rank the features, a subset of relevant attributes is chosen based on rough set theory and the merit values of the features A reduced table is obtained by removing those attributes which are not in the relevant attributes subset and the data set is further reduced vertically without destroying the interdependence relationships between classes and the attributes Then rough set-based value reduction is further performed on the reduced table and all redundant condition values are dropped Finally, tuples in the reduced table are transformed into a set of maximal generalized decision rules The experimental results on UCI data sets and a real market database demonstrate that our method can dramatically reduce the feature space and improve learning accuracy

...read moreread less

Proceedings Article•DOI•

Database selection for processing k nearest neighbors queries in distributed environments

[...]

Clement Yu, Prasoon Sharma, Weiyi Meng¹, Yan Qin•Institutions (1)

Binghamton University¹

01 Jan 2001

TL;DR: This paper focuses on the processing of the structured component of a distributed query, consisting of a text component and a structured component in distributed environments, for digital library queries.

...read moreread less

Abstract: We consider the processing of digital library queries, consisting of a text component and a structured component in distributed environments The text component can be processed using techniques given in previous papers such as [7, 8, 11] In this paper, we concentrate on the processing of the structured component of a distributed query Histograms are constructed and algorithms are given to provide estimates of the desirabilities of the databases with respect to the given query Databases are selected in descending order of desirability An algorithm is also given to select tuples from the selected databases Experimental results are given to show that the techniques provided here are effective and efficient

...read moreread less