scispace - formally typeset
Proceedings ArticleDOI

Implementing data cubes efficiently

Venky Harinarayan, +2 more
- Vol. 25, Iss: 2, pp 205-216
TLDR
In this article, a lattice framework is used to express dependencies among views and greedy algorithms are presented to determine a good set of views to materialize, with a small constant factor of optimal.
Abstract
Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor of optimal under a variety of models. We then consider the most common case of the hypercube lattice and examine the choice of materialized views for hypercubes in detail, giving some good tradeoffs between the space used and the average time to answer a query.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Data Mining: Concepts and Techniques

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Proceedings ArticleDOI

Maximizing the spread of influence through a social network

TL;DR: An analysis framework based on submodular functions shows that a natural greedy strategy obtains a solution that is provably within 63% of optimal for several classes of models, and suggests a general approach for reasoning about the performance guarantees of algorithms for these types of influence problems in social networks.
Journal ArticleDOI

An overview of data warehousing and OLAP technology

TL;DR: An overview of data warehousing and OLAP technologies, with an emphasis on their new requirements, is provided, based on a tutorial presented at the VLDB Conference, 1996.

Data Mining: Concepts and Techniques (2nd edition)

TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].
Journal ArticleDOI

Data mining: an overview from a database perspective

TL;DR: In this paper, a survey of the available data mining techniques is provided and a comparative study of such techniques is presented, based on a database researcher's point-of-view.
References
More filters
Journal ArticleDOI

A threshold of ln n for approximating set cover

TL;DR: It is proved that (1 - o(1) ln n setcover is a threshold below which setcover cannot be approximated efficiently, unless NP has slightlysuperpolynomial time algorithms.
Journal ArticleDOI

Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS

TL;DR: The data cube operator as discussed by the authors generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers.
Journal ArticleDOI

Query evaluation techniques for large databases

TL;DR: This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Proceedings ArticleDOI

Index selection for OLAP

TL;DR: The authors give algorithms that automate the selection of summary tables and indexes, and present a family of algorithms of increasing time complexities, and prove strong performance bounds for them.
Proceedings Article

Aggregate-Query Processing in Data Warehousing Environments

TL;DR: Generalized projections are introduced, that capture aggregations, groupbys, duplicate-eliminating projections (distinct and duplicate-preserving projections in a common unified framework), and powerful query rewrite rules for aggregate queries are developed that unify and extend rewrite rules previously known in the literature.