Dynamic discovery of query path on the lattice of cuboids using hierarchical data granularity and storage hierarchy
01 Jul 2014-Journal of Computational Science (Elsevier)-Vol. 5, Iss: 4, pp 675-683
TL;DR: This research work dynamically finds the most cost effective path from the lattice structure of cuboids based on concept hierarchy to minimize the query access time.
Abstract: Analytical processing on multi-dimensional data is performed over data warehouse. This, in general, is presented in the form of cuboids. The central theme of the data warehouse is represented in the form of fact table. A fact table is built from the related dimension tables. The cuboid that corresponds to the fact table is called base cuboid. All possible combination of the cuboids could be generated from base cuboid using successive roll-up operations and this corresponds to a lattice structure. Some of the dimensions may have a concept hierarchy in terms of multiple granularities of data. This means a dimension is represented in more than one abstract form. Typically, neither all the cuboids nor all the concept hierarchy are required for a specific business processing. These cuboids are resided in different layers of memory hierarchy like cache memory, primary memory, secondary memory, etc. This research work dynamically finds the most cost effective path from the lattice structure of cuboids based on concept hierarchy to minimize the query access time. The knowledge of location of cuboids at different memory elements is used for the purpose.
TL;DR: The proposed MOTH system exploits the coarse-grained of the fully and partially reused-based opportunities among queries with considering non-equal tuples size and non-uniform data distribution to avoid repeated computations and reduce multi-query execution time.
Abstract: Multi-query optimization in Big Data becomes a promising research direction due to the popularity of massive data analytical systems (eg, MapReduce and Flink) The multi-query is translated into jobs These jobs are routinely submitted with similar tasks to the underling Big Data analytical systems These similar tasks are considered complicated and computation overhead Therefore, there are some existing techniques that have been proposed for exploiting sharing tasks in Big Data multi-query optimization (eg, MRShare and Relaxed MRShare) These techniques are heavily tailored relaxed optimizing factors of fine-grained reused-based opportunities In accordance with Big Data multi-query optimization, the existing fine-grained techniques are only concerned with equal tuples size and uniform data distribution These issues are not applicable to the real-world distributed applications which depend on coarse-grained reused-based opportunities, such as non-equal tuples size and non-uniform data distribution These two issues receive more-attention in Big Data multi-query optimization, to minimize the data read from or written back to Big Data infrastructures (eg, Hadoop) In this paper, Multi-Query Optimization using Tuple Size and Histogram (MOTH) system has been proposed to consider the granularity of the reused-based opportunities The proposed MOTH system exploits the coarse-grained of the fully and partially reused-based opportunities among queries with considering non-equal tuples size and non-uniform data distribution to avoid repeated computations According to the proposed MOTH system, a combined technique has been introduced for estimating the coarse-grained reused-based opportunities horizontally and vertically The horizontal estimation of non-equal tuples size has been done by extracting metadata in column-level, while the vertical estimation of non-uniform data distribution is concerned with using pre-computed histogram in row-level In addition, the MOTH system estimates the coarse-grained reused-based opportunities with considering slow storage (ie, limited physical resources or fewer allocated virtualized resources) to produce the accurate estimation of the reused results costs Then, a cost-based heuristic algorithm has been introduced to select the best reused-based opportunity and generate an efficient multi-query execution plan Because the partial reused-based opportunities have been considered, extra computations are needed to retrieve the non-derived results Also, a partial reused-based optimizer has been tailored and added to the proposed MOTH system to reformulate the generated multi-query plan to improve the shared partial queries According to the experimental results of the proposed MOTH system using TPC-H benchmark, it is found that multi-query execution time has been reduced by considering the granularity of the reused results
••01 Dec 2015
TL;DR: A suitable Data warehouse schema is proposed which comprises of the required dimensions along with their concept hierarchies and the lattice of cuboids is constructed to carry out the OLAP processing from all possible business perspective.
Abstract: Each mobile device represents the digital footprint of the owner; at the same time mobile location data stored in telecom operators' databases in terms of Call Detail Record (CDR). It holds the precise identity of the mobile cell tower to which the owner is connected at any given time. Effectively mobile device count within a region for some time period can be calculated. Again, International Mobile Equipment Identity (IMEI) number is an unique identity to every mobile device, a part of which, known as Type Allocation Code (TAC) uniquely identifies the make and model of the mobile device which further identify the company or manufacturer of the mobile device. So combining them it is possible to analyze different business information about mobile penetration of companies in a defined region; hence the localized market share comparisons with other companies as well as among different models of same company. In order to model the problem and analyze huge CDR data, an analytical processing is carried out here using data warehouse. Here we propose a suitable Data warehouse schema which comprises of the required dimensions along with their concept hierarchies. The ETL processing which is done to form the data warehouse is described here. Finally the lattice of cuboids is constructed to carry out the OLAP processing from all possible business perspective.
TL;DR: A new methodology for efficient implementation by forming lattice on query parameters helps to co-relate the different query parameters that in turn form association rules among them.
Abstract: This research work is on optimizing the number of query parameters required to recommend an e-learning platform This paper proposes a new methodology for efficient implementation by forming lattice on query parameters This lattice structure helps to co-relate the different query parameters that in turn form association rules among them The proposed methodology is conceptualized on an e-learning platform with the objective of formulating an effective recommendation system to determine associations between various products offered by the e-learning platform by analyzing the minimum set of query parameters
01 Jan 2019
TL;DR: In this paper, the authors proposed a data warehouse model which integrates the existing parameters of loan disbursement related decisions and also incorporates the newly identified concepts to give the priorities to the customers who don't have any old credit history.
Abstract: Disbursement of loan is an important decision-making process for the corporate like banks and NBFC (Non-banking Finance Corporation) those offers loans. The business involves several parameters and the data which are associated to these parameters are generated from heterogeneous data sources and also belong to different business verticals. Henceforth the decision-making on loan scenarios are critical and the outcome involve solving the issues like whether to grant the loan or not, if sanctioned what is highest amount, etc. In this paper we consider the traditional parameters of loan sanction process along with these we identify one special case of Indian credit lending scenario where the people having old loans with good repayment history get priority. This limits the business opportunities for Bank/NBFC or other loan disbursement organizations as potential good customers having no loan history are treated with less priority. In this research work we propose a data warehouse model which integrates the existing parameters of loan disbursement related decisions and also incorporates the newly identified concepts to give the priorities to the customers who don’t have any old credit history.
••01 Jan 2019
TL;DR: This paper proposes to process these huge transactional data using ETL process and thereafter construction of a data warehouse (DW) which enables the POS provider to employ certain analytical processing for business gain.
Abstract: Point of Sales (POS) terminals are provided by the banking and financial institutes to perform cashless transactions. Over the time due to different conveniences, use of digital money and online card transactions increased many folds. After each successful payment transaction at the POS terminals, a transaction log is sent to the POS terminal provider with payment-related financial data such as date and time, amount, authorization service provider, cardholder’s bank, merchant identifier, store identifier, terminal number, etc. These data are useful for analytical processing which are useful for business. This paper proposes to process these huge transactional data using ETL process and thereafter construction of a data warehouse (DW) which enables the POS provider to employ certain analytical processing for business gain such as knowing own market share as well as position in market with respect to card payments, geographic location-wise business profiling, own as well as competitor’s customer segmentation based on monthly card usage, monthly amount spent, etc.
•08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data
••01 Jun 1996
TL;DR: In this article, a lattice framework is used to express dependencies among views and greedy algorithms are presented to determine a good set of views to materialize, with a small constant factor of optimal.
Abstract: Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor of optimal under a variety of models. We then consider the most common case of the hypercube lattice and examine the choice of materialized views for hypercubes in detail, giving some good tradeoffs between the space used and the average time to answer a query.
03 Sep 1996
TL;DR: In this article, the authors present fast algorithms for computing a collection of group bys, which is equivalent to the union of a number of standard group-by operations, and show how the structure of CUBE computation can be viewed in terms of a hierarchy of groupby operations.
Abstract: At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection of group bys. We focus on a special case of the aggregation problem - computation of the CUBE operator. The CUBE operator requires computing group-bys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard group-by operations. We show how the structure of CUBE computation can be viewed in terms of a hierarchy of group-by operations. Our algorithms extend sort-based and hashbased grouping methods with several .optimizations, like combining common operations across multiple groupbys, caching, and using pre-computed group-by8 for computing other groupbys. Empirical evaluation shows that the resulting algorithms give much better performance compared to straightforward meth
TL;DR: A theorem is presented characterizing the hierarchical structure of formal fuzzy concepts arising in a given formal fuzzy context, Dedekind–MacNeille completion of a partial fuzzy order and results provide foundations for formal concept analysis of vague data.
Abstract: The theory of concept lattices (i.e. hierarchical structures of concepts in the sense of Port-Royal school) is approached from the point of view of fuzzy logic. The notions of partial order, lattice order, and formal concept are generalized for fuzzy setting. Presented is a theorem characterizing the hierarchical structure of formal fuzzy concepts arising in a given formal fuzzy context. Also, as an application of the present approach, Dedekind–MacNeille completion of a partial fuzzy order is described. The approach and results provide foundations for formal concept analysis of vague data—the propositions “object x has attribute y ”, which form the input data to formal concept analysis, are now allowed to have also intermediate truth values, meeting reality better.
25 Aug 1997
TL;DR: The technique is proposed reduces the soluticn space by considering only the relevant elements of the multidimensional lattice whose elements represent the solution space of the problem.
Abstract: A multidimensional database is a data repository that supports the efficient execution of complex business decision queries. Query response can be significantly improved by storing an appropriate set of materialized views. These views are selected from the multidimensional lattice whose elements represent the solution space of the problem. Several techniques have been proposed in the past to perform the selection of materialized views for databases with a reduced number of dimensions. When the number and complexity of dimensions increase, the proposed techniques do not scale well. The technique we are proposing reduces the soluticn space by considering only the relevant elements of the multidimensional lattice. An additional statistical analysis allows a further reduction of the solution space.
Related Papers (5)
01 Jan 2021
01 Sep 2006