scispace - formally typeset
Search or ask a question
Topic

Tuple

About: Tuple is a research topic. Over the lifetime, 6513 publications have been published within this topic receiving 146057 citations. The topic is also known as: tuple & ordered tuplet.


Papers
More filters
Journal ArticleDOI
01 Aug 2018
TL;DR: This work presents SkinnerDB, a novel database management system that is designed from the ground up for reliable optimization and robust performance, and it is claimed that its execution strategies are the first to provide comparable formal guarantees.
Abstract: Robust query optimization becomes illusory in the presence of correlated predicates or user-defined functions. Occasionally, the query optimizer will choose join orders whose execution time is by many orders of magnitude higher than necessary. We present SkinnerDB, a novel database management system that is designed from the ground up for reliable optimization and robust performance.SkinnerDB implements several adaptive query processing strategies based on reinforcement learning. We divide the execution of a query into small time periods in which different join orders are executed. Thereby, we converge to optimal join orders with regret bounds, meaning that the expected difference between actual execution time and time for an optimal join order is bounded. To the best of our knowledge, our execution strategies are the first to provide comparable formal guarantees. SkinnerDB can be used as a layer on top of any existing database management system. We use optimizer hints to force existing systems to try out different join orders, carefully restricting execution time per join order and data batch via timeouts. We choose timeouts according to an iterative scheme that balances execution time over different timeouts to guarantee bounded regret. Alternatively, SkinnerDB can be used as a standalone, featuring an execution engine that is tailored to the requirements of join order learning. In particular, we use a specialized multi-way join algorithm and a concise tuple representation to facilitate fast switches between join orders. In our demonstration, we let participants experiment with different query types and databases. We visualize the learning process and compare against baselines.

37 citations

01 Jan 2002
TL;DR: This paper considers the Best operator, which can be used to smoothly embed preferences in queries of relational algebra, and studies general properties of this operator and presents a practical algorithm for its computation.
Abstract: Dealing with user preferences is becoming a widespread issue in novel data-intensive application domains, such as electronic catalogs, e-commerce, multimedia databases, and real estates. Given a set of preferences, an important problem is to efficiently determine which are the “best” objects, according to such preferences. In this paper we assume that preferences are expressed in a qualitative way over the tuples of a relation schema (e.g., I prefer product A to product B), which is quite natural from the user point of view and also includes, as a proper subcase, quantitative preferences defined by means of a scoring function. Starting from an analysis of basic properties of (qualitative) preferences, we consider the Best operator, which can be used to smoothly embed preferences in queries of relational algebra. We study general properties of this operator and present a practical algorithm for its computation. We show how the algorithm improves the simple nested-loops approach and can lead to faster response times.

37 citations

Book ChapterDOI
01 Apr 2010
TL;DR: This paper formally defines the new problem of top-k skyline computation, proposes an intelligent method to resolve this problem, and conducts a set of experiments to show the effectiveness and efficiency of the proposed algorithm.
Abstract: The problem of top-k skyline computation has attracted considerable research attention in the past few years. Given a dataset, a top-k skyline returns k “most interesting” skyline tuples based on some kind of preference specified by the user. We extend the concept of top-k skyline to a so-called top-k combinatorial skyline query (k-CSQ). In contrast to the existing top-k skyline query (which is mainly to find the interesting skyline tuples), a k-CSQ is to find the interesting skyline tuples from various kinds of combinations of the given tuples. The k-CSQ is an important tool for areas such as decision making, market analysis, business planning, and quantitative economics research. In this paper, we will formally define this new problem, propose an intelligent method to resolve this problem, and also conduct a set of experiments to show the effectiveness and efficiency of the proposed algorithm.

36 citations

Proceedings ArticleDOI
29 Mar 2009
TL;DR: This paper introduces definitions and algorithms for building histogram- and Haar wavelet-based synopses on probabilistic data and shows that this approach clearly outperforms simple ideas, such as building summaries for samples drawn from the data distribution, while taking equal or less time.
Abstract: There is a growing realization that uncertain information is a first-class citizen in modern database management. As such, we need techniques to correctly and efficiently process uncertain data in database systems. In particular, data reduction techniques that can produce concise, accurate synopses of large probabilistic relations are crucial. Similar to their deterministic relation counterparts, such compact probabilistic data synopses can form the foundation for human understanding and interactive data exploration, probabilistic query planning and optimization, and fast approximate query processing in probabilistic database systems. In this paper, we introduce definitions and algorithms for building histogram- and Haar wavelet-based synopses on probabilistic data. The core problem is to choose a set of histogram bucket boundaries or wavelet coefficients to optimize the accuracy of the approximate representation of a collection of probabilistic tuples under a given error metric. For a variety of different error metrics, we devise efficient algorithms that construct optimal or near optimal size B histogram and wavelet synopses. This requires careful analysis of the structure of the probability distributions, and novel extensions of known dynamic programming-based techniques for the deterministic domain. Our experiments show that this approach clearly outperforms simple ideas, such as building summaries for samples drawn from the data distribution, while taking equal or less time.

36 citations

Journal ArticleDOI
TL;DR: The proposed summary data model, enforcing the disjointness constraint, alleviates the intractable problem without loss of information and provides for efficient operations, including summary data search, derivation, insertion, and deletion.
Abstract: A data model and an access method for summary data management are presented. Summary data, represented as a trinary tuple (statistical function, category, summary), are metaknowledge summarized by a statistical function of a category of individual information typically stored in a conventional database. For instance, (average-income, female engineer with 10 years' experience and master's degree, $45000) is a summary datum. The computational complexity of the derivability problem has been found intractable in general, and the proposed summary data model, enforcing the disjointness constraint, alleviates the intractable problem without loss of information. In order to store, manage, and access summary data, a multidimensional access method called summary data (SD) tree is proposed. By preserving the category hierarchy, the SD tree provides for efficient operations, including summary data search, derivation, insertion, and deletion. >

36 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Time complexity
36K papers, 879.5K citations
85% related
Server
79.5K papers, 1.4M citations
83% related
Scalability
50.9K papers, 931.6K citations
83% related
Polynomial
52.6K papers, 853.1K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023203
2022459
2021210
2020285
2019306
2018266