scispace - formally typeset
Search or ask a question
Topic

Tuple

About: Tuple is a research topic. Over the lifetime, 6513 publications have been published within this topic receiving 146057 citations. The topic is also known as: tuple & ordered tuplet.


Papers
More filters
Journal ArticleDOI
01 Mar 1976
TL;DR: RARES is designed to enhance the performance of an optimizing relational query interface by supporting important high level optimization techniques and can perform tuple selection operations at the storage device and also can provide a mechanism for efficient sorting.
Abstract: The design and motivation for a rotating associative relational store (RARES) is described. RARES is designed to enhance the performance of an optimizing relational query interface by supporting important high level optimization techniques. In particular, it can perform tuple selection operations at the storage device and also can provide a mechanism for efficient sorting. Like other designs for rotating associative stores, RARES contains search logic which is attached to the heads of a rotating head-per-track storage device. RARES is distinct from other designs in that it utilizes a novel “orthogonal” storage layout. This layout allows a high output rate of selected tuples even when a sort order in the stored relation must be preserved. As in certain other designs, RARES can usually output a tuple as soon as it is found to satisfy the selection criteria. However, relative to these designs, the orthogonal layout allows an order of magnitude reduction in the capacity of storage local to the search logic.

168 citations

Journal ArticleDOI
TL;DR: This paper introduces Map-Join-Reduce, a system that extends and improves MapReduce runtime framework to efficiently process complex data analysis tasks on large clusters and presents a new data processing strategy which performs filtering-join-aggregation tasks in two successive Map Reduce jobs.
Abstract: Data analysis is an important functionality in cloud computing which allows a huge amount of data to be processed over very large clusters. MapReduce is recognized as a popular way to handle data in the cloud environment due to its excellent scalability and good fault tolerance. However, compared to parallel databases, the performance of MapReduce is slower when it is adopted to perform complex data analysis tasks that require the joining of multiple data sets in order to compute certain aggregates. A common concern is whether MapReduce can be improved to produce a system with both scalability and efficiency. In this paper, we introduce Map-Join-Reduce, a system that extends and improves MapReduce runtime framework to efficiently process complex data analysis tasks on large clusters. We first propose a filtering-join-aggregation programming model, a natural extension of MapReduce's filtering-aggregation programming model. Then, we present a new data processing strategy which performs filtering-join-aggregation tasks in two successive MapReduce jobs. The first job applies filtering logic to all the data sets in parallel, joins the qualified tuples, and pushes the join results to the reducers for partial aggregation. The second job combines all partial aggregation results and produces the final answer. The advantage of our approach is that we join multiple data sets in one go and thus avoid frequent checkpointing and shuffling of intermediate results, a major performance bottleneck in most of the current MapReduce-based systems. We benchmark our system against Hive, a state-of-the-art MapReduce-based data warehouse on a 100-node cluster on Amazon EC2 using TPC-H benchmark. The results show that our approach significantly boosts the performance of complex analysis queries.

168 citations

Journal ArticleDOI
TL;DR: A model is presented that reduces the evaluation of aggregate queries to the problem of selecting qualifying tuples and the grouping of these tuples into collections on which an aggregate function is to be applied.
Abstract: Spatiotemporal databases are becoming increasingly more common. Typically, applications modeling spatiotemporal objects need to process vast amounts of data. In such cases, generating aggregate information from the data set is more useful than individually analyzing every entry. In this paper, we study the most relevant techniques for the evaluation of aggregate queries on spatial, temporal, and spatiotemporal data. We also present a model that reduces the evaluation of aggregate queries to the problem of selecting qualifying tuples and the grouping of these tuples into collections on which an aggregate function is to be applied. This model gives us a framework that allows us to analyze and compare the different existing techniques for the evaluation of aggregate queries. At the same time, it allows us to identify opportunities for research on types of aggregate queries that have not been studied.

167 citations

Journal ArticleDOI
01 Aug 2009
TL;DR: Glasgow, a component library and compositional compiler that transforms continuous queries into logic circuits by composing library components on an operator-level basis is presented.
Abstract: Taking advantage of many-core, heterogeneous hardware for data processing tasks is a difficult problem. In this paper, we consider the use of FPGAs for data stream processing as coprocessors in many-core architectures. We present Glacier, a component library and compositional compiler that transforms continuous queries into logic circuits by composing library components on an operator-level basis. In the paper we consider selection, aggregation, grouping, as well as windowing operators, and discuss their design as modular elements.We also show how significant performance improvements can be achieved by inserting the FPGA into the system's data path (e.g., between the network interface and the host CPU). Our experiments show that queries on the FPGA can process streams at more than one million tuples per second and that they can do this directly from the network, removing much of the overhead of transferring the data to a conventional CPU.

167 citations

Patent
Navin Kabra1, Jignesh M. Patel2, Jie-Bing Yu1, Biswadeep Nag1, Jian-Jun Chen1 
22 Dec 1999
TL;DR: In this paper, a C++ class (hereinafter referred to as "dispatcher") is proposed to take an SQL query and start parallel execution of the query, which is optimized and parallelized.
Abstract: A method, apparatus, and an article of manufacture for parallel execution of SQL operations from within user defined functions. One or more embodiments of the invention provide the user defined function (UDF) with a C++ class (hereinafter referred to as “dispatcher”) that can take an SQL query and start parallel execution of the query. The query is optimized and parallelized. The dispatcher executes the query, sets up the communication links between the various operators in the query, and ensures that all the results are sent back to the data-server that originated the query request. Further, the dispatcher merges the results of the parallel execution and produces a single stream of tuples that is fed to the calling UDF. To provide the single stream to the calling UDF, one or more embodiments of the invention utilize a class that provides the UDF with a simple and easy-to-use interface to access the results of the nested SQL execution.

167 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Time complexity
36K papers, 879.5K citations
85% related
Server
79.5K papers, 1.4M citations
83% related
Scalability
50.9K papers, 931.6K citations
83% related
Polynomial
52.6K papers, 853.1K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023203
2022459
2021210
2020285
2019306
2018266