Implementing data cubes efficiently

doi:10.1145/233269.233333

Proceedings ArticleDOI

Implementing data cubes efficiently

- Vol. 25, Iss: 2, pp 205-216

TLDR

In this article, a lattice framework is used to express dependencies among views and greedy algorithms are presented to determine a good set of views to materialize, with a small constant factor of optimal.

Abstract:

Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor of optimal under a variety of models. We then consider the most common case of the hypercube lattice and examine the choice of materialized views for hypercubes in detail, giving some good tradeoffs between the space used and the average time to answer a query.

Citations

PDF

Open Access

More filters

Book

Data Mining: Concepts and Techniques

Jiawei Han, +2 more

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Proceedings ArticleDOI

Maximizing the spread of influence through a social network

David Kempe, +2 more

TL;DR: An analysis framework based on submodular functions shows that a natural greedy strategy obtains a solution that is provably within 63% of optimal for several classes of models, and suggests a general approach for reasoning about the performance guarantees of algorithms for these types of influence problems in social networks.

...read moreread less

Journal ArticleDOI

An overview of data warehousing and OLAP technology

Surajit Chaudhuri, +1 more

TL;DR: An overview of data warehousing and OLAP technologies, with an emphasis on their new requirements, is provided, based on a tutorial presented at the VLDB Conference, 1996.

...read moreread less

Data Mining: Concepts and Techniques (2nd edition)

Jiawei Han, +1 more

TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].

...read moreread less

Journal ArticleDOI

Data mining: an overview from a database perspective

Ming-Syan Chen, +2 more

- 01 Dec 1996 -

IEEE Transactions on Knowledge and Data ...

TL;DR: In this paper, a survey of the available data mining techniques is provided and a comparative study of such techniques is presented, based on a database researcher's point-of-view.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

A threshold of ln n for approximating set cover

Uriel Feige

- 01 Jul 1998 -

Journal of the ACM

TL;DR: It is proved that (1 - o(1) ln n setcover is a threshold below which setcover cannot be approximated efficiently, unless NP has slightlysuperpolynomial time algorithms.

...read moreread less

Journal ArticleDOI

Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS

Jim Gray, +3 more

TL;DR: The data cube operator as discussed by the authors generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers.

...read moreread less

Journal ArticleDOI

Query evaluation techniques for large databases

Goetz Graefe

- 01 Jun 1993 -

ACM Computing Surveys

TL;DR: This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.

...read moreread less

Proceedings ArticleDOI

Index selection for OLAP

Himanshu Gupta, +3 more

TL;DR: The authors give algorithms that automate the selection of summary tables and indexes, and present a family of algorithms of increasing time complexities, and prove strong performance bounds for them.

...read moreread less

Proceedings Article

Aggregate-Query Processing in Data Warehousing Environments

Ashish Gupta, +2 more

TL;DR: Generalized projections are introduced, that capture aggregations, groupbys, duplicate-eliminating projections (distinct and duplicate-preserving projections in a common unified framework), and powerful query rewrite rules for aggregate queries are developed that unify and extend rewrite rules previously known in the literature.

...read moreread less

Implementing data cubes efficiently

Citations

Data Mining: Concepts and Techniques

Maximizing the spread of influence through a social network

An overview of data warehousing and OLAP technology

Data Mining: Concepts and Techniques (2nd edition)

Data mining: an overview from a database perspective

References

A threshold of ln n for approximating set cover

Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS

Query evaluation techniques for large databases

Index selection for OLAP

Aggregate-Query Processing in Data Warehousing Environments

Related Papers (5)

Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS

Index selection for OLAP

An overview of data warehousing and OLAP technology

Selection of Views to Materialize in a Data Warehouse

On the Computation of Multidimensional Aggregates