Optimal Space and Time Complexity Analysis on the Lattice of Cuboids Using Galois Connections for Data Warehousing

doi:10.1109/ICCIT.2009.185

Home
/
Papers
/
Optimal Space and Time Complexity Analysis on the Lattice of Cuboids Using Galois Connections for Data Warehousing

Proceedings Article•DOI•

Optimal Space and Time Complexity Analysis on the Lattice of Cuboids Using Galois Connections for Data Warehousing

Soumya Sen¹, Nabendu Chaki¹, Agostino Cortesi•Institutions (1)

Information Technology University¹

24 Nov 2009-pp 1271-1275

TL;DR: An optimal aggregation and counter-aggregation (drill-down) methodology is proposed on multidimensional data cube to aggregate on smaller cuboids after partitioning those depending on the cardinality of the individual dimensions.

read less

Abstract: In this paper, an optimal aggregation and counter-aggregation (drill-down) methodology is proposed on multidimensional data cube. The main idea is to aggregate on smaller cuboids after partitioning those depending on the cardinality of the individual dimensions. Based on the operations to make these partitions, a Galois Connection is identified for formal analysis that allow to guarantee the soundness of optimizations of storage space and time complexity for the abstraction and concretization functions defined on the lattice structure. Our contribution can be seen as an application to OLAP operations on multidimensional data model in the Abstract Interpretation framework.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Dynamic discovery of query path on the lattice of cuboids using hierarchical data granularity and storage hierarchy

[...]

Soumya Sen¹, Santanu Roy², Anirban Sarkar³, Nabendu Chaki⁴, Narayan C. Debnath⁵ - Show less +1 more•Institutions (5)

Information Technology University¹, Future Institute of Engineering and Management², National Institute of Technology, Durgapur³, University of Calcutta⁴, Winona State University⁵

01 Jul 2014-Journal of Computational Science

TL;DR: This research work dynamically finds the most cost effective path from the lattice structure of cuboids based on concept hierarchy to minimize the query access time.

...read moreread less

13 citations

Proceedings Article•DOI•

Efficient Traversal in Data Warehouse Based on Concept Hierarchy Using Galois Connections

[...]

Soumya Sen¹, Nabendu Chaki¹•Institutions (1)

Information Technology University¹

19 Feb 2011

TL;DR: A new algorithm has been proposed using a dynamic data structure that reduces over time resulting in better space utilization and also reduction of computation time and offers formalism in analysis using concept hierarchy in an abstract interpretation framework.

...read moreread less

Abstract: This paper propose a new methodology for efficient implementation of OLAP operations using concept hierarchies of attributes in a data warehouse. The different granularity associated with a particular dimension and the hierarchy amongst those may be represented as a lattice. The focus is to move up (roll-up) and down (drill-down) within the lattice structure using an algorithm with optimal time complexity. In this paper, a new algorithm has been proposed using a dynamic data structure that reduces over time resulting in better space utilization and also reduction of computation time. A Galois Connection is identified on this lattice structure with well-defined abstraction and concretization functions based on the concept hierarchy. The contribution offers formalism in analysis using concept hierarchy in an abstract interpretation framework.

...read moreread less

9 citations

Additional excerpts

...In contrast to the proposed work in [1], a particular dimension has been considered for which the “concept hierarchy” prevails....
[...]

Book Chapter•DOI•

An Efficient Recommendation System on E-Learning Platform by Query Lattice Optimization

[...]

Subhadeep Ghosh¹, Santanu Roy², Soumya Sen³•Institutions (3)

Tata Consultancy Services¹, Future Institute of Engineering and Management², Information Technology University³

01 Jan 2021-Advances in intelligent systems and computing

TL;DR: A new methodology for efficient implementation by forming lattice on query parameters helps to co-relate the different query parameters that in turn form association rules among them.

...read moreread less

Abstract: This research work is on optimizing the number of query parameters required to recommend an e-learning platform This paper proposes a new methodology for efficient implementation by forming lattice on query parameters This lattice structure helps to co-relate the different query parameters that in turn form association rules among them The proposed methodology is conceptualized on an e-learning platform with the objective of formulating an effective recommendation system to determine associations between various products offered by the e-learning platform by analyzing the minimum set of query parameters

...read moreread less

8 citations

Book Chapter•DOI•

A Data Warehouse Based Schema Design on Decision-Making in Loan Disbursement for Indian Advance Sector

[...]

Ishita Das¹, Santanu Roy¹, Amlan Chatterjee¹, Soumya Sen¹•Institutions (1)

University of Calcutta¹

01 Jan 2019

TL;DR: In this paper, the authors proposed a data warehouse model which integrates the existing parameters of loan disbursement related decisions and also incorporates the newly identified concepts to give the priorities to the customers who don't have any old credit history.

...read moreread less

Abstract: Disbursement of loan is an important decision-making process for the corporate like banks and NBFC (Non-banking Finance Corporation) those offers loans. The business involves several parameters and the data which are associated to these parameters are generated from heterogeneous data sources and also belong to different business verticals. Henceforth the decision-making on loan scenarios are critical and the outcome involve solving the issues like whether to grant the loan or not, if sanctioned what is highest amount, etc. In this paper we consider the traditional parameters of loan sanction process along with these we identify one special case of Indian credit lending scenario where the people having old loans with good repayment history get priority. This limits the business opportunities for Bank/NBFC or other loan disbursement organizations as potential good customers having no loan history are treated with less priority. In this research work we propose a data warehouse model which integrates the existing parameters of loan disbursement related decisions and also incorporates the newly identified concepts to give the priorities to the customers who don’t have any old credit history.

...read moreread less

5 citations

Book Chapter•DOI•

Efficient Data Cube Materialization

[...]

Raghu Prashant¹, Mann Suman¹, Raghu Eashwaran¹•Institutions (1)

Maharaja Surajmal Institute of Technology¹

01 Jan 2021

TL;DR: This paper proposes an algorithm for cuboid materialization starting from a source cuboid to the target cuboid in an optimal way such that the intermediate cuboids consume less space and require lower time to generate by making sure those cuboids have the least number of rows compared to other valid cuboids available for selection.

...read moreread less

Abstract: In the field of business intelligence, we require the analysis of multidimensional data with the need for it being fast and interactive. Data warehousing and OLAP approaches have been developed for this purpose in which the data is viewed in the form of a multidimensional data cube which allows interactive analysis of the data in various levels of abstraction presented in a graphical manner. In data cube, there may arise a need to materialize a particular cuboid given that some other cuboid is presently materialized, in this paper, we propose an algorithm for cuboid materialization starting from a source cuboid to the target cuboid in an optimal way such that the intermediate cuboids consume less space and require lower time to generate by making sure those cuboids have the least number of rows compared to other valid cuboids available for selection, by sorting them based on the product of cardinalities of dimensions present in each cuboid.

...read moreread less

5 citations

1
2
3
4
…

References

PDF

Open Access

More filters

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

"Optimal Space and Time Complexity A..." refers background in this paper

...A partial order amongst the cuboids is observed in the sense that there exists cuboid Ci that may or may not be derived from some other cuboid Cj. The resulting Poset, in fact, would form a lattice of cuboids [ 7 ]....
[...]

Proceedings Article•

Fast algorithms for mining association rules

[...]

Rakesh Agrawal, Ramakrishnan Srikant

01 Jul 1998

TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.

...read moreread less

Abstract: We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving thii problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. We also show how the best features of the two proposed algorithms can be combined into a hybrid algorithm, called AprioriHybrid. Scale-up experiments show that AprioriHybrid scales linearly with the number of transactions. AprioriHybrid also has excellent scale-up properties with respect to the transaction size and the number of items in the database.

...read moreread less

10,863 citations

Proceedings Article•DOI•

Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints

[...]

Patrick Cousot, Radhia Cousot

01 Jan 1977

TL;DR: In this paper, the abstract interpretation of programs is used to describe computations in another universe of abstract objects, so that the results of abstract execution give some information on the actual computations.

...read moreread less

Abstract: A program denotes computations in some universe of objects. Abstract interpretation of programs consists in using that denotation to describe computations in another universe of abstract objects, so that the results of abstract execution give some information on the actual computations. An intuitive example (which we borrow from Sintzoff [72]) is the rule of signs. The text -1515 * 17 may be understood to denote computations on the abstract universe {(+), (-), (±)} where the semantics of arithmetic operators is defined by the rule of signs. The abstract execution -1515 * 17 → -(+) * (+) → (-) * (+) → (-), proves that -1515 * 17 is a negative number. Abstract interpretation is concerned by a particular underlying structure of the usual universe of computations (the sign, in our example). It gives a summary of some facets of the actual executions of a program. In general this summary is simple to obtain but inaccurate (e.g. -1515 + 17 → -(+) + (+) → (-) + (+) → (±)). Despite its fundamentally incomplete results abstract interpretation allows the programmer or the compiler to answer questions which do not need full knowledge of program executions or which tolerate an imprecise answer, (e.g. partial correctness proofs of programs ignoring the termination problems, type checking, program optimizations which are not carried in the absence of certainty about their feasibility, …).

...read moreread less

6,829 citations

"Optimal Space and Time Complexity A..." refers background in this paper

...In figure 1, there are four different attributes labeled as 1, 2, 3, and 4....
[...]

Proceedings Article•

On the Computation of Multidimensional Aggregates

[...]

Sameet Agarwal, Rakesh Agrawal, Prasad M. Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, Sunita Sarawagi - Show less +3 more

03 Sep 1996

TL;DR: In this article, the authors present fast algorithms for computing a collection of group bys, which is equivalent to the union of a number of standard group-by operations, and show how the structure of CUBE computation can be viewed in terms of a hierarchy of groupby operations.

...read moreread less

Abstract: At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection of group bys. We focus on a special case of the aggregation problem - computation of the CUBE operator. The CUBE operator requires computing group-bys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard group-by operations. We show how the structure of CUBE computation can be viewed in terms of a hierarchy of group-by operations. Our algorithms extend sort-based and hashbased grouping methods with several .optimizations, like combining common operations across multiple groupbys, caching, and using pre-computed group-by8 for computing other groupbys. Empirical evaluation shows that the resulting algorithms give much better performance compared to straightforward meth

...read moreread less

608 citations

Book Chapter•DOI•

Efficient OLAP Operations in Spatial Data Warehouses

[...]

Dimitris Papadias¹, Panos Kalnis¹, Jun Zhang¹, Yufei Tao¹•Institutions (1)

Hong Kong University of Science and Technology¹

12 Jul 2001

TL;DR: An ad-hoc grouping hierarchy based on the spatial index at the finest spatial granularity is constructed and incorporated in the lattice model and efficient methods to process arbitrary aggregations are presented.

...read moreread less

Abstract: Spatial databases store information about the position of individual objects in space. In many applications however, such as traffic supervision or mobile communications, only summarized data, like the number of cars in an area or phones serviced by a cell, is required. Although this information can be obtained from transactional spatial databases, its computation is expensive, rendering online processing inapplicable. Driven by the non-spatial paradigm, spatial data warehouses can be constructed to accelerate spatial OLAP operations. In this paper we consider the star-schema and we focus on the spatial dimensions. Unlike the non-spatial case, the groupings and the hierarchies can be numerous and unknown at design time, therefore the well-known materialization techniques are not directly applicable. In order to address this problem, we construct an ad-hoc grouping hierarchy based on the spatial index at the finest spatial granularity. We incorporate this hierarchy in the lattice model and present efficient methods to process arbitrary aggregations. We finally extend our technique to moving objects by employing incremental update methods.

...read moreread less

367 citations

"Optimal Space and Time Complexity A..." refers background in this paper

...There has been work on the efficiency of OLAP operations on spatial data in a data warehouse [ 9 ]....
[...]
...There have been a number of works on computational aspects of roll-up, drill-down and other OLAP operations [1, 2, 3, 6, 8, 9 , 10, 11]....
[...]