scispace - formally typeset
Search or ask a question
Topic

Tuple

About: Tuple is a research topic. Over the lifetime, 6513 publications have been published within this topic receiving 146057 citations. The topic is also known as: tuple & ordered tuplet.


Papers
More filters
Proceedings ArticleDOI
01 Mar 2010
TL;DR: Wang et al. as mentioned in this paper developed a data publishing technique that ensures differential privacy while providing accurate answers for range-count queries, i.e., count queries where the predicate on each attribute is a range.
Abstract: Privacy preserving data publishing has attracted considerable research interest in recent years. Among the existing solutions, ∈-differential privacy provides one of the strongest privacy guarantees. Existing data publishing methods that achieve ∈-differential privacy, however, offer little data utility. In particular, if the output dataset is used to answer count queries, the noise in the query answers can be proportional to the number of tuples in the data, which renders the results useless. In this paper, we develop a data publishing technique that ensures ∈-differential privacy while providing accurate answers for range-count queries, i.e., count queries where the predicate on each attribute is a range. The core of our solution is a framework that applies wavelet transforms on the data before adding noise to it. We present instantiations of the proposed framework for both ordinal and nominal data, and we provide a theoretical analysis on their privacy and utility guarantees. In an extensive experimental study on both real and synthetic data, we show the effectiveness and efficiency of our solution.

302 citations

Book ChapterDOI
09 Sep 2003
TL;DR: It is argued that a fine-grained scheduling approach in combination with various scheduling techniques (such as batching of operators and tuples) can significantly improve system efficiency by reducing various system overheads.
Abstract: Many stream-based applications have sophisticated data processing requirements and real-time performance expectations that need to be met under high-volume, time-varying data streams. In order to address these challenges, we propose novel operator scheduling approaches that specify (1) which operators to schedule (2) in which order to schedule the operators, and (3) how many tuples to process at each execution step. We study our approaches in the context of the Aurora data stream manager. We argue that a fine-grained scheduling approach in combination with various scheduling techniques (such as batching of operators and tuples) can significantly improve system efficiency by reducing various system overheads. We also discuss application-aware extensions that make scheduling decisions according to per-application Quality of Service (QoS) specifications. Finally, we present prototype-based experimental results that characterize the efficiency and effectiveness of our approaches under various stream workloads and processing scenarios.

299 citations

Proceedings ArticleDOI
29 Mar 2009
TL;DR: This work addresses a novel spatial keyword query called the m-closest keywords (mCK) query, which aims to find the spatially closest tuples which match m user-specified keywords, and introduces a new index called the bR*-tree, which is an extension of the R-tree.
Abstract: This work addresses a novel spatial keyword query called the m-closest keywords (mCK) query. Given a database of spatial objects, each tuple is associated with some descriptive information represented in the form of keywords. The mCK query aims to find the spatially closest tuples which match m user-specified keywords. Given a set of keywords from a document, mCK query can be very useful in geotagging the document by comparing the keywords to other geotagged documents in a database. To answer mCK queries efficiently, we introduce a new index called the bR*-tree, which is an extension of the R*-tree. Based on bR*-tree, we exploit a priori-based search strategies to effectively reduce the search space. We also propose two monotone constraints, namely the distance mutex and keyword mutex, as our a priori properties to facilitate effective pruning. Our performance study demonstrates that our search strategy is indeed efficient in reducing query response time and demonstrates remarkable scalability in terms of the number of query keywords which is essential for our main application of searching by document.

298 citations

Proceedings ArticleDOI
09 Jun 2003
TL;DR: This work proposes to demonstrate the Aurora system with its development environment and runtime system, with several example monitoring applications developed in consultation with defense, financial, and natural science communities, and shows the effect of various system alternatives on various workloads.
Abstract: The Aurora system [1] is an experimental data stream management system with a fully functional prototype. It includes both a graphical development environment, and a runtime system. We propose to demonstrate the Aurora system with its development environment and runtime system, with several example monitoring applications developed in consultation with defense, financial, and natural science communities. We will also demonstrate the effect of various system alternatives on various workloads. For example, we will show how different scheduling algorithms affect tuple latency and internal queue lengths. We will use some of our visualization tools to accomplish this. Data Stream Management Aurora is a data stream management system for monitoring applications. Streams are continuous data feeds from such sources as sensors, satellites and stock feeds. Monitoring applications track the data from numerous streams, filtering them for signs of abnormal activity and processing them for purposes of aggregation, reduction and correlation. The management requirements for monitoring applications differ profoundly from those satisfied by a traditional DBMS: o A traditional DBMS assumes a passive model where most data processing results from humans issuing transactions and queries. Data stream management requires a more active approach, monitoring data feeds from unpredictable external sources (e.g., sensors) and alerting humans when abnormal activity is detected. o A traditional DBMS manages data that is currently in its tables. Data stream management often requires processing data that is bounded by some finite window of values, and not over an unbounded past. o A traditional DBMS provides exact answers to exact queries, and is blind to real-time deadlines. Data stream management often must respond to real-time deadlines (e.g., military applications monitoring positions of enemy platforms) and therefore must often provide reasonable approximations to queries. o A traditional query processor optimizes all queries in the same way (typically focusing on response time). A stream data manager benefits from application specific optimization criteria (QoS). o A traditional DBMS assumes pull-based queries to be the norm. Push-based data processing is the norm for a data stream management system. A Brief Summary of Aurora Aurora has been designed to deal with very large numbers of data streams. Users build queries out of a small set of operators (a.k.a. boxes). The current implementation provides a user interface for tapping into pre-existing inputs and network flows and for wiring boxes together to produces answers at the outputs. While it is certainly possible to accept input as declarative queries, we feel that for a very large number of such queries, the process of common sub-expression elimination is too difficult. An example of an Aurora network is given in Screen Shot 1. A simple stream is a potentially infinite sequence of tuples that all have the same stream ID. An arc carries multiple simple streams. This is important so that simple streams can be added and deleted from the system without having to modify the basic network. A query, then, is a sub-network that ends at a single output and includes an arbitrary number of inputs. Boxes can connect to multiple downstream boxes. All such path splits carry identical tuples. Multiple streams can be merged since some box types accept more than one input (e.g., Join, Union). We do not allow any cycles in an operator network. Each output is supplied with a Quality of Service (QoS) specification. Currently, QoS is captured by three functions (1) a latency graph, (2) a value-based graph, and (3) a loss-tolerance graph. The latency graph indicates how utility drops as an answer is delayed. The value-based graph shows which values of the output space are most important. The loss-tolerance graph is a simple way to describe how averse the application is to approximate answers. Tuples arrive at the input and are queued for processing. A scheduler selects a box with waiting tuples and executes that box on one or more of the input tuples. The output tuples of a box are queued at the input of the next box in sequence. In this way, tuples make their way from the inputs to the outputs. If the system is overloaded, QoS is adversely affected. In this case, we invoke a load shedder to strategically eliminate Aurora supports persistent storage in two different ways. First, when box queues consume more storage than available RAM, the system will spill tuples that are less likely to be needed soon to secondary storage. Second, ad hoc queries can be connected to (and disconnected from) any arc for which a connection point has been defined. A connection point stores a historical portion of a stream that has flowed on the arc. For example, one could define a connection point as the last hour’s worth of data that has been seen on a given arc. Any ad hoc query that connects to a connection point has access to the full stored history as well as any additional data that flows past while the query is connected.

293 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors address binary visual question answering (VQA) on abstract scenes by converting the question to a tuple that concisely summarizes the visual concept to be detected in the image.
Abstract: The complex compositional structure of language makes problems at the intersection of vision and language challenging. But language also provides a strong prior that can result in good superficial performance, without the underlying models truly understanding the visual content. This can hinder progress in pushing state of art in the computer vision aspects of multi-modal AI. In this paper, we address binary Visual Question Answering (VQA) on abstract scenes. We formulate this problem as visual verification of concepts inquired in the questions. Specifically, we convert the question to a tuple that concisely summarizes the visual concept to be detected in the image. If the concept can be found in the image, the answer to the question is "yes", and otherwise "no". Abstract scenes play two roles (1) They allow us to focus on the highlevel semantics of the VQA task as opposed to the low-level recognition problems, and perhaps more importantly, (2) They provide us the modality to balance the dataset such that language priors are controlled, and the role of vision is essential. In particular, we collect fine-grained pairs of scenes for every question, such that the answer to the question is "yes" for one scene, and "no" for the other for the exact same question. Indeed, language priors alone do not perform better than chance on our balanced dataset. Moreover, our proposed approach matches the performance of a state-of-the-art VQA approach on the unbalanced dataset, and outperforms it on the balanced dataset.

290 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Time complexity
36K papers, 879.5K citations
85% related
Server
79.5K papers, 1.4M citations
83% related
Scalability
50.9K papers, 931.6K citations
83% related
Polynomial
52.6K papers, 853.1K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023203
2022459
2021210
2020285
2019306
2018266