Fast Computation of Sparse Datacubes

Open AccessProceedings Article

Fast Computation of Sparse Datacubes

Kenneth A. Ross, +1 more

- pp 116-125

Chats0

TLDR

This work proposes a novel algorithm for the fast computation of datacubes over sparse relations, and demonstrates the efficiency of the algorithm using synthetic, benchmark and real-world data sets.

Abstract:

Datacube queries compute aggregates over database relations at a variety of granularities, and they constitute an important class of decision support queries. Real-world data is frequently sparse, and hence efficiently computing datacubes over large sparse relations is important. We show that current techniques for computing datacubes over sparse relations do not scale well with the number of CUBE BY attributes, especially when the relation is much larger than main memory. We propose a novel algorithm for the fast computation of datacubes over sparse relations, and demonstrate the efficiency of our algorithm using synthetic, benchmark and real-world data sets. When the relation fits in memory, our technique performs multiple in-memory sorts, and does not incur any I/O beyond the input of the relation and the output of the datacube itself. When the relation does not fit in memory, a divideand-conquer strategy divides the problem of computing the datacube into several simpler computations of sub-datacubes. Often, all but one of the sub-datacubes can be computed in memory and our in-memory solution applies. In that case, the total I/O overhead is linear in the number of CUBE BY attributes. We demonstrate with an implementation that the CPU cost of our algorithm is dominated by the I/O cost for sparse relations. ‘The research of Kenneth A.

Fast Computation of Sparse Datacubes

Citations

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques (2nd edition)

Bottom-up computation of sparse and Iceberg CUBE

Hierarchical Document Clustering Using Frequent Itemsets

Efficient computation of the skyline cube

References

Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS

Implementing data cubes efficiently

On the Computation of Multidimensional Aggregates

An array-based algorithm for simultaneous multidimensional aggregates

Edited synoptic cloud reports from ships and land stations over the globe, 1982--1991

Related Papers (5)

Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS

On the Computation of Multidimensional Aggregates

An array-based algorithm for simultaneous multidimensional aggregates

Bottom-up computation of sparse and Iceberg CUBE

Implementing data cubes efficiently