scispace - formally typeset
Open AccessProceedings Article

Fast Computation of Sparse Datacubes

Kenneth A. Ross, +1 more
- pp 116-125
Reads0
Chats0
TLDR
This work proposes a novel algorithm for the fast computation of datacubes over sparse relations, and demonstrates the efficiency of the algorithm using synthetic, benchmark and real-world data sets.
Abstract
Datacube queries compute aggregates over database relations at a variety of granularities, and they constitute an important class of decision support queries. Real-world data is frequently sparse, and hence efficiently computing datacubes over large sparse relations is important. We show that current techniques for computing datacubes over sparse relations do not scale well with the number of CUBE BY attributes, especially when the relation is much larger than main memory. We propose a novel algorithm for the fast computation of datacubes over sparse relations, and demonstrate the efficiency of our algorithm using synthetic, benchmark and real-world data sets. When the relation fits in memory, our technique performs multiple in-memory sorts, and does not incur any I/O beyond the input of the relation and the output of the datacube itself. When the relation does not fit in memory, a divideand-conquer strategy divides the problem of computing the datacube into several simpler computations of sub-datacubes. Often, all but one of the sub-datacubes can be computed in memory and our in-memory solution applies. In that case, the total I/O overhead is linear in the number of CUBE BY attributes. We demonstrate with an implementation that the CPU cost of our algorithm is dominated by the I/O cost for sparse relations. ‘The research of Kenneth A.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Data Mining: Concepts and Techniques

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

Data Mining: Concepts and Techniques (2nd edition)

TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].
Journal ArticleDOI

Bottom-up computation of sparse and Iceberg CUBE

TL;DR: The pruning in BUC, combined with an efficient sort method, enables BUC to outperform all previous algorithms for sparse CUBEs, even for computing entire CUBes, and to dramatically improve Iceberg-CUBE computation.
Proceedings Article

Hierarchical Document Clustering Using Frequent Itemsets

TL;DR: This paper proposes to use the notion of frequent itemsets, which comes from association rule mining, for document clustering, and shows that this method outperforms best existing methods in terms of both clustering accuracy and scalability.
Proceedings Article

Efficient computation of the skyline cube

TL;DR: Two novel algorithms, Bottom-Up and Top-Down algorithms, are proposed to compute SKYCUBE efficiently and it is shown that new algorithms significantly outperform the naive ones.
References
More filters
Journal ArticleDOI

Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS

TL;DR: The data cube operator as discussed by the authors generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers.
Proceedings ArticleDOI

Implementing data cubes efficiently

TL;DR: In this article, a lattice framework is used to express dependencies among views and greedy algorithms are presented to determine a good set of views to materialize, with a small constant factor of optimal.
Proceedings Article

On the Computation of Multidimensional Aggregates

TL;DR: In this article, the authors present fast algorithms for computing a collection of group bys, which is equivalent to the union of a number of standard group-by operations, and show how the structure of CUBE computation can be viewed in terms of a hierarchy of groupby operations.
Proceedings ArticleDOI

An array-based algorithm for simultaneous multidimensional aggregates

TL;DR: In this paper, a MOLAP algorithm was proposed to compute the Cube operator for multi-dimensional OLAP (MOLAP) systems, which store their data in sparse arrays rather than in tables.
ReportDOI

Edited synoptic cloud reports from ships and land stations over the globe, 1982--1991

TL;DR: In this paper, the synoptic weather reports for the entire globe for the 10-year period from December 1981 through November 1991 have been processed, edited, and rewritten to provide a data set designed for use in cloud analyses.