Dynamic query path selection from lattice of cuboids using memory hierarchy

doi:10.1109/ISCC.2013.6754923

Home
/
Papers
/
Dynamic query path selection from lattice of cuboids using memory hierarchy

Proceedings Article•DOI•

Dynamic query path selection from lattice of cuboids using memory hierarchy

Santanu Roy, Soumya Sen¹, Anirban Sarkar², Nabendu Chaki³, Narayan C. Debnath⁴ - Show less +1 more•Institutions (4)

Information Technology University¹, National Institute of Technology, Durgapur², University of Calcutta³, Winona State University⁴

07 Jul 2013-pp 000049-000054

TL;DR: The focus of this research work is to dynamically identify the most cost effective path within the lattice structure of cuboids to minimize the query access time having the knowledge of existing cuboid location at different memory elements.

read less

Abstract: Data warehouse represents multi-dimensional data suitable for analytical processing and logically data are organized in the form of data cube or cuboid. Data warehouse actually represents a business theme which is called fact table. The cuboid that identifies the complete fact table is called base cuboid. The all possible combination of the cuboids that could be generated from base cuboid corresponds to lattice structure. A lattice consists of numbers of cuboids. In real life, all these cuboids may not be important for business analysis. Thus all of them are not always called during business processing. The cuboids that are referred in different applications are fetched from diverse memory hierarchy such as cache memory, primary memory and secondary memory. The different execution speed of the respective memory element is taken into account which forms a memory hierarchy. The focus of this research work is to dynamically identify the most cost effective path within the lattice structure of cuboids to minimize the query access time having the knowledge of existing cuboid location at different memory elements.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Dynamic discovery of query path on the lattice of cuboids using hierarchical data granularity and storage hierarchy

[...]

Soumya Sen¹, Santanu Roy², Anirban Sarkar³, Nabendu Chaki⁴, Narayan C. Debnath⁵ - Show less +1 more•Institutions (5)

Information Technology University¹, Future Institute of Engineering and Management², National Institute of Technology, Durgapur³, University of Calcutta⁴, Winona State University⁵

01 Jul 2014-Journal of Computational Science

TL;DR: This research work dynamically finds the most cost effective path from the lattice structure of cuboids based on concept hierarchy to minimize the query access time.

...read moreread less

13 citations

Journal Article•DOI•

AFARTICA: A Frequent Item-Set Mining Method Using Artificial Cell Division Algorithm

[...]

Saubhik Paladhi¹, Sankhadeep Chatterjee², Takaaki Goto³, Soumya Sen²•Institutions (3)

Kalyani Government Engineering College¹, University of Calcutta², Toyo University³

01 Jul 2019-Journal of Database Management

4 citations

Book Chapter•DOI•

A Novel Indexing Scheme Over Lattice of Cuboids and Concept Hierarchy in Data Warehouse

[...]

Бобомурадов Б.К., Рузиева С.Г.¹•Institutions (1)

University of Calcutta¹

01 Jan 2022

TL;DR: In this paper , a secondary index scheme is proposed over lattice of cuboids by analyzing the existing query set and a novel methodology is proposed to rank the dimensions based on the usage of the cuboids and corresponding dimensions.

...read moreread less

Abstract: In data warehouse, lattice of cuboids is very important as it represents all the combination of dimensions for that particular business application. The size of the structure is high as for N dimensions, the total number of cuboids is 2N. If the dimensions maintain concept hierarchies, that results in more numbers of cuboids. Hence, the search time is quite high if the data warehouse maintains cuboids in the form of lattice. Here, a secondary index scheme is proposed over lattice of cuboids by analyzing the existing query set. A novel methodology is proposed to rank the dimensions based on the usage of the cuboids and corresponding dimensions. Secondary index is created based on these ranking and that improves the search time significantly. Both the case study and experimental results show the efficacy of the proposed method.

...read moreread less

References

PDF

Open Access

More filters

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

Proceedings Article•DOI•

Implementing data cubes efficiently

[...]

Venky Harinarayan¹, Anand Rajaraman¹, Jeffrey D. Ullman¹•Institutions (1)

Stanford University¹

01 Jun 1996

TL;DR: In this article, a lattice framework is used to express dependencies among views and greedy algorithms are presented to determine a good set of views to materialize, with a small constant factor of optimal.

...read moreread less

Abstract: Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor of optimal under a variety of models. We then consider the most common case of the hypercube lattice and examine the choice of materialized views for hypercubes in detail, giving some good tradeoffs between the space used and the average time to answer a query.

...read moreread less

1,499 citations

Proceedings Article•

On the Computation of Multidimensional Aggregates

[...]

Sameet Agarwal, Rakesh Agrawal, Prasad M. Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, Sunita Sarawagi - Show less +3 more

03 Sep 1996

TL;DR: In this article, the authors present fast algorithms for computing a collection of group bys, which is equivalent to the union of a number of standard group-by operations, and show how the structure of CUBE computation can be viewed in terms of a hierarchy of groupby operations.

...read moreread less

Abstract: At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection of group bys. We focus on a special case of the aggregation problem - computation of the CUBE operator. The CUBE operator requires computing group-bys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard group-by operations. We show how the structure of CUBE computation can be viewed in terms of a hierarchy of group-by operations. Our algorithms extend sort-based and hashbased grouping methods with several .optimizations, like combining common operations across multiple groupbys, caching, and using pre-computed group-by8 for computing other groupbys. Empirical evaluation shows that the resulting algorithms give much better performance compared to straightforward meth

...read moreread less

608 citations

Proceedings Article•

Materialized Views Selection in a Multidimensional Database

[...]

Elena Baralis, Stefano Paraboschi, Ernest Teniente

25 Aug 1997

TL;DR: The technique is proposed reduces the soluticn space by considering only the relevant elements of the multidimensional lattice whose elements represent the solution space of the problem.

...read moreread less

Abstract: A multidimensional database is a data repository that supports the efficient execution of complex business decision queries. Query response can be significantly improved by storing an appropriate set of materialized views. These views are selected from the multidimensional lattice whose elements represent the solution space of the problem. Several techniques have been proposed in the past to perform the selection of materialized views for databases with a reduced number of dimensions. When the number and complexity of dimensions increase, the proposed techniques do not scale well. The technique we are proposing reduces the soluticn space by considering only the relevant elements of the multidimensional lattice. An additional statistical analysis allows a further reduction of the solution space.

...read moreread less

396 citations

Journal Article•DOI•

The cgmCUBE project: Optimizing parallel data cube generation for ROLAP

[...]

Frank Dehne¹, Todd Eavis², Andrew Rau-Chaplin³•Institutions (3)

Carleton University¹, Concordia University², Dalhousie University³

01 Jan 2006-Distributed and Parallel Databases

TL;DR: This paper discusses the cgmCUBE Project, a multi-year effort to design and implement aMulti-processor platform for data cube generation that targets the relational database model (ROLAP), and discusses new algorithmic and system optimizations relating to a thorough optimization of the underlying sequential cube construction method.

...read moreread less

Abstract: On-line Analytical Processing (OLAP) has become one of the most powerful and prominent technologies for knowledge discovery in VLDB (Very Large Database) environments. Central to the OLAP paradigm is the data cube, a multi-dimensional hierarchy of aggregate values that provides a rich analytical model for decision support. Various sequential algorithms for the efficient generation of the data cube have appeared in the literature. However, given the size of contemporary data warehousing repositories, multi-processor solutions are crucial for the massive computational demands of current and future OLAP systems. In this paper we discuss the cgmCUBE Project, a multi-year effort to design and implement a multi-processor platform for data cube generation that targets the relational database model (ROLAP). More specifically, we discuss new algorithmic and system optimizations relating to (1) a thorough optimization of the underlying sequential cube construction method and (2) a detailed and carefully engineered cost model for improved parallel load balancing and faster sequential cube construction. These optimizations were key in allowing us to build a prototype that is able to produce data cube output at a rate of over one TeraByte per hour.

...read moreread less

48 citations