scispace - formally typeset
Search or ask a question
Topic

Tuple

About: Tuple is a research topic. Over the lifetime, 6513 publications have been published within this topic receiving 146057 citations. The topic is also known as: tuple & ordered tuplet.


Papers
More filters
01 Jan 2002
TL;DR: This paper considers the Best operator, which can be used to smoothly embed preferences in queries of relational algebra, and studies general properties of this operator and presents a practical algorithm for its computation.
Abstract: Dealing with user preferences is becoming a widespread issue in novel data-intensive application domains, such as electronic catalogs, e-commerce, multimedia databases, and real estates. Given a set of preferences, an important problem is to efficiently determine which are the “best” objects, according to such preferences. In this paper we assume that preferences are expressed in a qualitative way over the tuples of a relation schema (e.g., I prefer product A to product B), which is quite natural from the user point of view and also includes, as a proper subcase, quantitative preferences defined by means of a scoring function. Starting from an analysis of basic properties of (qualitative) preferences, we consider the Best operator, which can be used to smoothly embed preferences in queries of relational algebra. We study general properties of this operator and present a practical algorithm for its computation. We show how the algorithm improves the simple nested-loops approach and can lead to faster response times.

37 citations

Book ChapterDOI
01 Apr 2010
TL;DR: This paper formally defines the new problem of top-k skyline computation, proposes an intelligent method to resolve this problem, and conducts a set of experiments to show the effectiveness and efficiency of the proposed algorithm.
Abstract: The problem of top-k skyline computation has attracted considerable research attention in the past few years. Given a dataset, a top-k skyline returns k “most interesting” skyline tuples based on some kind of preference specified by the user. We extend the concept of top-k skyline to a so-called top-k combinatorial skyline query (k-CSQ). In contrast to the existing top-k skyline query (which is mainly to find the interesting skyline tuples), a k-CSQ is to find the interesting skyline tuples from various kinds of combinations of the given tuples. The k-CSQ is an important tool for areas such as decision making, market analysis, business planning, and quantitative economics research. In this paper, we will formally define this new problem, propose an intelligent method to resolve this problem, and also conduct a set of experiments to show the effectiveness and efficiency of the proposed algorithm.

36 citations

Proceedings ArticleDOI
29 Mar 2009
TL;DR: This paper introduces definitions and algorithms for building histogram- and Haar wavelet-based synopses on probabilistic data and shows that this approach clearly outperforms simple ideas, such as building summaries for samples drawn from the data distribution, while taking equal or less time.
Abstract: There is a growing realization that uncertain information is a first-class citizen in modern database management. As such, we need techniques to correctly and efficiently process uncertain data in database systems. In particular, data reduction techniques that can produce concise, accurate synopses of large probabilistic relations are crucial. Similar to their deterministic relation counterparts, such compact probabilistic data synopses can form the foundation for human understanding and interactive data exploration, probabilistic query planning and optimization, and fast approximate query processing in probabilistic database systems. In this paper, we introduce definitions and algorithms for building histogram- and Haar wavelet-based synopses on probabilistic data. The core problem is to choose a set of histogram bucket boundaries or wavelet coefficients to optimize the accuracy of the approximate representation of a collection of probabilistic tuples under a given error metric. For a variety of different error metrics, we devise efficient algorithms that construct optimal or near optimal size B histogram and wavelet synopses. This requires careful analysis of the structure of the probability distributions, and novel extensions of known dynamic programming-based techniques for the deterministic domain. Our experiments show that this approach clearly outperforms simple ideas, such as building summaries for samples drawn from the data distribution, while taking equal or less time.

36 citations

Journal ArticleDOI
TL;DR: The proposed summary data model, enforcing the disjointness constraint, alleviates the intractable problem without loss of information and provides for efficient operations, including summary data search, derivation, insertion, and deletion.
Abstract: A data model and an access method for summary data management are presented. Summary data, represented as a trinary tuple (statistical function, category, summary), are metaknowledge summarized by a statistical function of a category of individual information typically stored in a conventional database. For instance, (average-income, female engineer with 10 years' experience and master's degree, $45000) is a summary datum. The computational complexity of the derivability problem has been found intractable in general, and the proposed summary data model, enforcing the disjointness constraint, alleviates the intractable problem without loss of information. In order to store, manage, and access summary data, a multidimensional access method called summary data (SD) tree is proposed. By preserving the category hierarchy, the SD tree provides for efficient operations, including summary data search, derivation, insertion, and deletion. >

36 citations

Proceedings ArticleDOI
01 Jan 2008
TL;DR: This talk considers association rule mining on arbitrary relational databases by combining pairs of queries which could reveal interesting properties in the database, and considers a new pattern class consisting of conj unctive queries over relational databases, called simple conjunctive queries and defines associations using the well known notion of query containment.
Abstract: The discovery of recurring patterns in databases is one of the main topics in data mining and many efficient solutions have been developed for relatively simple classes of patterns and data collections Indeed, most frequent pattern mining or association rule mining algorithms work on so called transaction databases Not only for itemsets, but also for more complex patterns such as trees, graphs, or arbitrary relational structures, databases consisting of a set of transactions are used For example, in the tree case [2], every transaction in the database contains a tree, and the presented algorithm tries to find all frequent subtrees occurring within all such transactions For all these pattern classes, specialized algorithms exist to discover them efficiently The motivation for these works is the potentially high business value of the discovered patterns [1] Unfortunately, many relational databases are not suited to be converted into a transactional format and even if this would be possible, a lot of information implicitly encoded in the relational model would be lost after conversion In this talk we consider association rule mining on arbitrary relational databases by combining pairs of queries which could reveal interesting properties in the database Intuitively, we pose two queries on the database such that the second query is more specific than the first query Then, if the number of tuples in the output of both queries is almost the same, this could reveal a potentially interesting discovery To illustrate, consider the well known Internet Movie Database containing almost all possible information about movies, actors and everything related to that, and consider the following queries: first, we ask for all actors that have starred in a movie of the genre ‘drama’; then, we ask for all actors that have starred in a movie of the genre ‘drama’, but that also starred in a (possibly different) movie of the genre ‘comedy’ Now suppose the answer to the first query consists of 1000 actors, and the answer to the second query consists of 900 actors Obviously, these answers do not necessarily reveal any significant insights on themselves, but when combined, it reveals the potentially interesting pattern that actors starring in ‘drama’ movies typically (with a probability of 90%) also star in a ‘comedy’ movie Of course, this pattern could also have been found by first preprocessing the database, and creating a transaction for each actor containing the set of all genres of movies he or she appeared in Similarly, a pattern like: 77% of the movies starring Ben Affleck, also star Matt Damon, could be found by posing the query asking for all movies starring Ben Affleck, and the query asking for all movies starring both Ben Affleck and Matt Damon Again, this could also be found using frequent set mining methods, but this time, the database should have been differently preprocessed in order to find this pattern Furthermore, it is even impossible to preprocess the database only once in such a way that the above two patterns would be found by frequent set mining as they are essentially counting a different type of transactions Indeed, we are counting actors in the first example, and movies in the second example In general, we are looking for pairs of queries Q1, Q2, such that Q1 asks for a set of tuples satisfying a certain condition and Q2 asks for those tuples satisfying a more specific condition When it turns out that the size of the output Q2 is close to the size of the output of Q1, we learned that most of the tuples in the output of Q1 actually satisfy a more specific condition, as specified in Q2 Clearly, such findings could reveal interesting patterns in the given database Towards this goal, we consider a new pattern class consisting of conjunctive queries over relational databases, called simple conjunctive queries and define associations using the well known notion of query containment We propose an completely novel algorithm, Conqueror, efficiently generating and pruning the search space of all simple conjunctive queries We illustrate that next to many different kinds of interesting patterns, our algorithm is also able to discover functional dependencies, inclusion dependencies, but also their variants, such as the very recently studied conditional functional dependencies, which turn out to be very useful for data cleaning purposes

36 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Time complexity
36K papers, 879.5K citations
85% related
Server
79.5K papers, 1.4M citations
83% related
Scalability
50.9K papers, 931.6K citations
83% related
Polynomial
52.6K papers, 853.1K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023205
2022461
2021210
2020286
2019313
2018269