Topic

Tuple

About: Tuple is a research topic. Over the lifetime, 6513 publications have been published within this topic receiving 146057 citations. The topic is also known as: tuple & ordered tuplet.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Practical lineage tracing in data warehouses

[...]

Yingwei Cui¹, Jennifer Widom¹•Institutions (1)

Stanford University¹

01 Feb 2000

TL;DR: A lineage tracing package for relational views with aggregation is implemented in the WHIPS data warehousing system prototype at Stanford, and a number of schemes for storing auxiliary views that enable consistent and efficient lineage tracing in a multi-source data warehouse are proposed.

...read moreread less

Abstract: We consider the view data lineage problem in a warehousing environment: for a given data item in a materialized warehouse view, we want to identify the set of source data items that produced the view item. We formalize the problem and we present a lineage tracing algorithm for relational views with aggregation. Based on our tracing algorithm, we propose a number of schemes for storing auxiliary views that enable consistent and efficient lineage tracing in a multi-source data warehouse. We report on a performance study of the various schemes, identifying which schemes perform best in which settings. Based on our results, we have implemented a lineage tracing package in the WHIPS data warehousing system prototype at Stanford. With this package, users can select view tuples of interest, then efficiently "drill through" to examine the exact source tuples that produced the view tuples of interest.

...read moreread less

171 citations

Book Chapter•DOI•

Probabilistic ranking of database query results

[...]

Surajit Chaudhuri¹, Gautam Das¹, Vagelis Hristidis², Gerhard Weikum•Institutions (2)

Microsoft¹, University of Miami²

31 Aug 2004

TL;DR: This work adapt and apply principles of probabilistic models from Information Retrieval for structured data to solve the problem of ranking answers to a database query when many tuples are returned.

...read moreread less

Abstract: We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlations. Our ranking functions can be further customized for different applications. We present results of preliminary experiments which demonstrate the efficiency as well as the quality of our ranking system.

...read moreread less

171 citations

Proceedings Article•DOI•

Efficient Computation of Diverse Query Results

[...]

Erik Vee¹, Utkarsh Srivastava¹, Jayavel Shanmugasundaram¹, P. Bhat¹, Sihem Amer Yahia¹ - Show less +1 more•Institutions (1)

Yahoo!¹

07 Apr 2008

TL;DR: In this paper, the problem of efficiently computing diverse query results in online shopping applications was studied, where users specify queries through a form interface that allows a mix of structured and content-based selection conditions.

...read moreread less

Abstract: We study the problem of efficiently computing diverse query results in online shopping applications, where users specify queries through a form interface that allows a mix of structured and content-based selection conditions. Intuitively, the goal of diverse query answering is to return a representative set of top-k answers from all the tuples that satisfy the user selection condition. For example, if a user is searching for Honda cars and we can only display five results, we wish to return cars from five different Honda models, as opposed to returning cars from only one or two Honda models. A key contribution of this paper is to formally define the notion of diversity, and to show that existing score based techniques commonly used in web applications are not sufficient to guarantee diversity. Another contribution of this paper is to develop novel and efficient query processing techniques that guarantee diversity. Our experimental results using Yahoo! Autos data show that our proposed techniques are scalable and efficient.

...read moreread less

171 citations

Proceedings Article•DOI•

Sketching probabilistic data streams

[...]

Graham Cormode¹, Minos Garofalakis²•Institutions (2)

AT&T Labs¹, Yahoo!²

11 Jun 2007

TL;DR: These algorithms offer strong randomized estimation guarantees while using only sublinear space in the size of the stream(s), and rely on novel, concise streaming sketch synopses that extend conventional sketching ideas to the probabilistic streams setting.

...read moreread less

Abstract: The management of uncertain, probabilistic data has recently emerged as a useful paradigm for dealing with the inherent unreliabilities of several real-world application domains, including data cleaning, information integration, and pervasive, multi-sensor computing. Unlike conventional data sets, a set of probabilistic tuples defines a probability distribution over an exponential number of possible worlds (i.e., "grounded", deterministic databases). This "possibleworlds" interpretation allows for clean query semantics but also raises hard computational problems for probabilistic database query processors. To further complicate matters, in many scenarios (e.g., large-scale process and environmental monitoring using multiple sensor modalities), probabilistic data tuples arrive and need to be processed in a streaming fashion; that is, using limited memory and CPU resources and without the benefit of multiple passes over a static probabilistic database. Such probabilistic data streams raise a host of new research challenges for stream-processing engines that, to date, remain largely unaddressed. In this paper, we propose the first space- and time-efficient algorithms for approximating complex aggregate queries (including, the number of distinct values and join/self-join sizes) over probabilistic data streams. Following the possible-worlds semantics, such aggregates essentially define probability distributions over the space of possible aggregation results, and our goal is to characterize such distributions through efficient approximations of their key moments (such as expectation and variance). Our algorithms offer strong randomized estimation guarantees while using only sublinear space in the size of the stream(s), and rely on novel, concise streaming sketch synopses that extend conventional sketching ideas to the probabilistic streams setting. Our experimental results verify the effectiveness of our approach.

...read moreread less

170 citations

Proceedings Article•DOI•

PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions

[...]

Jun Zhang¹, Xiaokui Xiao¹, Xing Xie²•Institutions (2)

Nanyang Technological University¹, Microsoft²

14 Jun 2016

TL;DR: PrivTree is a histogram construction algorithm that adopts hierarchical decomposition but completely eliminates the dependency on a pre-defined h, and exploits a new analysis on the Laplace distribution, which enables it to use only a constant amount of noise in deciding whether a sub-domain should be split, without worrying about the recursion depth of splitting.

...read moreread less

Abstract: Given a set D of tuples defined on a domain Omega, we study differentially private algorithms for constructing a histogram over Omega to approximate the tuple distribution in D. Existing solutions for the problem mostly adopt a hierarchical decomposition approach, which recursively splits Omega into sub-domains and computes a noisy tuple count for each sub-domain, until all noisy counts are below a certain threshold. This approach, however, requires that we (i) impose a limit h on the recursion depth in the splitting of Omega and (ii) set the noise in each count to be proportional to h. The choice of h is a serious dilemma: a small h makes the resulting histogram too coarse-grained, while a large h leads to excessive noise in the tuple counts used in deciding whether sub-domains should be split. Furthermore, h cannot be directly tuned based on D; otherwise, the choice of h itself reveals private information and violates differential privacy. To remedy the deficiency of existing solutions, we present PrivTree, a histogram construction algorithm that adopts hierarchical decomposition but completely eliminates the dependency on a pre-defined h. The core of PrivTree is a novel mechanism that (i) exploits a new analysis on the Laplace distribution and (ii) enables us to use only a constant amount of noise in deciding whether a sub-domain should be split, without worrying about the recursion depth of splitting. We demonstrate the application of PrivTree in modelling spatial data, and show that it can be extended to handle sequence data (where the decision in sub-domain splitting is not based on tuple counts but a more sophisticated measure). Our experiments on a variety of real datasets show that PrivTree considerably outperforms the states of the art in terms of data utility.

...read moreread less

169 citations

Collapse

Network Information

Performance

Metrics

7,188

Papers

157,520

Citations

No. of papers in the topic in previous years
Year	Papers
2023	203
2022	459
2021	210
2020	285
2019	306
2018	266

Tuple

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics