Dynamic query path selection from lattice of cuboids using memory hierarchy
07 Jul 2013-pp 000049-000054
TL;DR: The focus of this research work is to dynamically identify the most cost effective path within the lattice structure of cuboids to minimize the query access time having the knowledge of existing cuboid location at different memory elements.
Abstract: Data warehouse represents multi-dimensional data suitable for analytical processing and logically data are organized in the form of data cube or cuboid. Data warehouse actually represents a business theme which is called fact table. The cuboid that identifies the complete fact table is called base cuboid. The all possible combination of the cuboids that could be generated from base cuboid corresponds to lattice structure. A lattice consists of numbers of cuboids. In real life, all these cuboids may not be important for business analysis. Thus all of them are not always called during business processing. The cuboids that are referred in different applications are fetched from diverse memory hierarchy such as cache memory, primary memory and secondary memory. The different execution speed of the respective memory element is taken into account which forms a memory hierarchy. The focus of this research work is to dynamically identify the most cost effective path within the lattice structure of cuboids to minimize the query access time having the knowledge of existing cuboid location at different memory elements.
TL;DR: This research work dynamically finds the most cost effective path from the lattice structure of cuboids based on concept hierarchy to minimize the query access time.
Abstract: Analytical processing on multi-dimensional data is performed over data warehouse. This, in general, is presented in the form of cuboids. The central theme of the data warehouse is represented in the form of fact table. A fact table is built from the related dimension tables. The cuboid that corresponds to the fact table is called base cuboid. All possible combination of the cuboids could be generated from base cuboid using successive roll-up operations and this corresponds to a lattice structure. Some of the dimensions may have a concept hierarchy in terms of multiple granularities of data. This means a dimension is represented in more than one abstract form. Typically, neither all the cuboids nor all the concept hierarchy are required for a specific business processing. These cuboids are resided in different layers of memory hierarchy like cache memory, primary memory, secondary memory, etc. This research work dynamically finds the most cost effective path from the lattice structure of cuboids based on concept hierarchy to minimize the query access time. The knowledge of location of cuboids at different memory elements is used for the purpose.
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data
01 Jun 1996
TL;DR: In this article, a lattice framework is used to express dependencies among views and greedy algorithms are presented to determine a good set of views to materialize, with a small constant factor of optimal.
Abstract: Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor of optimal under a variety of models. We then consider the most common case of the hypercube lattice and examine the choice of materialized views for hypercubes in detail, giving some good tradeoffs between the space used and the average time to answer a query.
03 Sep 1996
TL;DR: In this article, the authors present fast algorithms for computing a collection of group bys, which is equivalent to the union of a number of standard group-by operations, and show how the structure of CUBE computation can be viewed in terms of a hierarchy of groupby operations.
Abstract: At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection of group bys. We focus on a special case of the aggregation problem - computation of the CUBE operator. The CUBE operator requires computing group-bys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard group-by operations. We show how the structure of CUBE computation can be viewed in terms of a hierarchy of group-by operations. Our algorithms extend sort-based and hashbased grouping methods with several .optimizations, like combining common operations across multiple groupbys, caching, and using pre-computed group-by8 for computing other groupbys. Empirical evaluation shows that the resulting algorithms give much better performance compared to straightforward meth
25 Aug 1997
TL;DR: The technique is proposed reduces the soluticn space by considering only the relevant elements of the multidimensional lattice whose elements represent the solution space of the problem.
Abstract: A multidimensional database is a data repository that supports the efficient execution of complex business decision queries. Query response can be significantly improved by storing an appropriate set of materialized views. These views are selected from the multidimensional lattice whose elements represent the solution space of the problem. Several techniques have been proposed in the past to perform the selection of materialized views for databases with a reduced number of dimensions. When the number and complexity of dimensions increase, the proposed techniques do not scale well. The technique we are proposing reduces the soluticn space by considering only the relevant elements of the multidimensional lattice. An additional statistical analysis allows a further reduction of the solution space.
TL;DR: This paper discusses the cgmCUBE Project, a multi-year effort to design and implement aMulti-processor platform for data cube generation that targets the relational database model (ROLAP), and discusses new algorithmic and system optimizations relating to a thorough optimization of the underlying sequential cube construction method.
Abstract: On-line Analytical Processing (OLAP) has become one of the most powerful and prominent technologies for knowledge discovery in VLDB (Very Large Database) environments. Central to the OLAP paradigm is the data cube, a multi-dimensional hierarchy of aggregate values that provides a rich analytical model for decision support. Various sequential algorithms for the efficient generation of the data cube have appeared in the literature. However, given the size of contemporary data warehousing repositories, multi-processor solutions are crucial for the massive computational demands of current and future OLAP systems. In this paper we discuss the cgmCUBE Project, a multi-year effort to design and implement a multi-processor platform for data cube generation that targets the relational database model (ROLAP). More specifically, we discuss new algorithmic and system optimizations relating to (1) a thorough optimization of the underlying sequential cube construction method and (2) a detailed and carefully engineered cost model for improved parallel load balancing and faster sequential cube construction. These optimizations were key in allowing us to build a prototype that is able to produce data cube output at a rate of over one TeraByte per hour.