scispace - formally typeset
Search or ask a question
Book ChapterDOI

Association Based Multi-attribute Analysis to Construct Materialized View

TL;DR: An algorithm which generates a materialized view by considering the frequencies of the multiple attributes at a time taken from a database with the help of Apriori algorithm is proposed, which supports scalabilisalubrityty as well as flexibility.
Abstract: Analysis of data is an inherent part in the world of business to identify interesting patterns underlying in the data set. The size of the data is usually huge in the modern day application. Searching the data from the huge data set with a lesser time complexity is always a subject of interest. These data are mostly stored in tables based on relational model. Data are fetched from these tables using SQL queries. Query response time is an important quality factor for this type of system. Materialized view formation is the most common way of enhancing the query execution speed across industries. Different approaches have been applied over the time to generate materialized views. However few attempts have been made to construct materialized views with the help of Association based mining algorithms and none of those existing Association based methods measure the performance of the views in terms of both Hit-Miss ratio and view size scalability. This paper proposes an algorithm which generates a materialized view by considering the frequencies of the multiple attributes at a time taken from a database with the help of Apriori algorithm. Apriori algorithm is used to generate frequent attribute sets which are further considered for materialization. Moreover by varying the support count, changing the sizes of the frequent attributes sets; proposed methodology supports scalabilisalubrityty as well as flexibility. Experimental results are given to prove the enhanced results over existing inter-attribute analysis based materialized view formation.
Citations
More filters
Journal ArticleDOI
01 Apr 2021
TL;DR: In this paper, the authors proposed a new association rule mining technique for quick decision-making and it gives better performance over Apriori algorithm which is one of the most popular approaches for Association rule mining.
Abstract: Recommendation systems are now inherent for many business applications to take important business decisions. These systems are built based on the historical data that may be the sales data or customer feedback etc. Customer feedback is very important for any organization as it reflects the view, sentiment of the customers. Online systems allow customers to purchase products at a glance from any e-commerce website. Generally, the potential buyers check the review of the products to take informed decision of purchase. In this work, we attempt to build a recommendation model to find out the influence of a product on another product so that if a user purchases the influential product then the recommender system can recommend the influenced products to the users. In this paper, the recommendation system has been built based on association rule mining. We proposed a new association rule mining technique for quick decision-making and it gives better performance over Apriori algorithm which is one of the most popular approaches for association rule mining. The entire framework has been developed in Neo4j graph data model for doing the data modelling from raw text file and also to perform the analysis. We used real-life customer feedback data of amazon for experimental purpose.

6 citations

Book ChapterDOI
18 Sep 2018
TL;DR: This paper introduces and tested a storage alternative which goes against current data normalization premises, where storage space is no longer a concern, and proposes a new concept system where query execution time must be entirely predictable, independently of its complexity, called, SINGLE.
Abstract: Over the past decade’s several new concepts emerged to organize and query data over large Data Warehouse (DW) system with the same primary objective, that is, optimize processing speed. More recently, with the rise of BigData concept, storage cost lowered significantly, and performance (random accesses) increased, particularly with modern SSD disks. This paper introduces and tested a storage alternative which goes against current data normalization premises, where storage space is no longer a concern. By de-normalizing the entire data schema (transparent to the user) it is proposed a new concept system where query execution time must be entirely predictable, independently of its complexity, called, SINGLE. The proposed data model also allows easy partitioning and distributed processing to enable execution parallelism, boosting performance, as happens in MapReduce. TPC-H benchmark is used to evaluate storage space and query performance. Results show predictable performance when comparing with approaches based on a normalized relational schema, and MapReduce oriented.

2 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a non-binary data space approach to construct a weighted materialized view in a distributed environment by applying the association mining technique in the non-binary data space.
Abstract: Materialized views are heavily used to speed up the query response time of any data centric application. In the literature, the construction and dynamic maintenance of materialized views are carried out in a Binary Data Space where all attributes are given the same weight. Considering different weights may be particularly significant when similar queries are fired from multiple sites in a distributed environment, as taking into account the number of accesses to the different attribute values may reflect into the ability of tuning the materialized views accordingly. The methodology to construct weighted materialized view introduced in this paper is based on the association mining techniques, by applying it in a Non-Binary Data Space in distributed environments. The allocation of the views in the operating sites is also considered to a suitable use in distributed databases. Experimental results prove the superiority of proposed methodology on three bench mark datasets in terms of query Hit-Miss ratio and regulation of the view size with varying requirement of practical applications.
Book ChapterDOI
01 Jan 2021
TL;DR: In this article, the authors proposed a non-binary data space based approach to construct weighted materialized views, based on the association mining techniques, by applying it in a Non-Binary Data Space.
Abstract: Materialized views are heavily used to speed up the query response time of any data centric application. In literature, the construction and dynamic maintenance of materialized views are carried out in a Binary Data Space where all attributes are given the same weight. Considering different weights may be particularly significant when similar queries are posed by multiple users, as taking into account the number of accesses to the different attribute values may reflect into the ability of tuning the materialized views accordingly. The methodology to construct weighted materialized view introduced in this paper is based on the association mining techniques, by applying it in a Non-Binary Data Space. The proposed algorithm has been verified by simulation experiments with two benchmark datasets using practical transactional queries. The experimental results prove the superiority of our proposal in terms of query Hit-Miss ratio and flexibility of view size extendibility according to the requirement of practical applications.
References
More filters
Proceedings Article
01 Jul 1998
TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.
Abstract: We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving thii problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. We also show how the best features of the two proposed algorithms can be combined into a hybrid algorithm, called AprioriHybrid. Scale-up experiments show that AprioriHybrid scales linearly with the number of transactions. AprioriHybrid also has excellent scale-up properties with respect to the transaction size and the number of items in the database.

10,863 citations

Journal ArticleDOI
TL;DR: A novel frequent-pattern tree (FP-tree) structure is proposed, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and an efficient FP-tree-based mining method, FP-growth, is developed for mining the complete set of frequent patterns by pattern fragment growth.
Abstract: Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist a large number of patterns and/or long patterns. In this study, we propose a novel frequent-pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a condensed, smaller data structure, FP-tree which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern-fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent-pattern mining methods.

2,567 citations

Proceedings ArticleDOI
01 Jun 1997
TL;DR: A new algorithm for finding large itemsets which uses fewer passes over the data than classic algorithms, and yet uses fewer candidate itemsets than methods based on sampling and a new way of generating “implication rules” which are normalized based on both the antecedent and the consequent.
Abstract: We consider the problem of analyzing market-basket data and present several important contributions. First, we present a new algorithm for finding large itemsets which uses fewer passes over the data than classic algorithms, and yet uses fewer candidate itemsets than methods based on sampling. We investigate the idea of item reordering, which can improve the low-level efficiency of the algorithm. Second, we present a new way of generating “implication rules,” which are normalized based on both the antecedent and the consequent and are truly implications (not simply a measure of co-occurrence), and we show how they produce more intuitive results than other methods. Finally, we show how different characteristics of real data, as opposed by synthetic data, can dramatically affect the performance of the system and the form of the results.

2,149 citations

Proceedings ArticleDOI
01 Jun 1996
TL;DR: In this article, a lattice framework is used to express dependencies among views and greedy algorithms are presented to determine a good set of views to materialize, with a small constant factor of optimal.
Abstract: Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor of optimal under a variety of models. We then consider the most common case of the hypercube lattice and examine the choice of materialized views for hypercubes in detail, giving some good tradeoffs between the space used and the average time to answer a query.

1,499 citations

Proceedings Article
24 Aug 1998
TL;DR: The ability to rewrite a large class of queries based on a small set of MVs is supported by using Dimensions, losslessness of joins, functional dependency, column equivalence, join derivability, joinback and aggregate rollup.
Abstract: Oracle Materialized Views (MVs) are designed for data warehousing and replication. For data warehousing, MVs based on inner/outer equijoins with optional aggregation, can be refreshed on transaction boundaries, on demand, or periodically. Refreshes are optimized for bulk loads and can use a multi-MV scheduler. MVs based on subqueries on remote tables support bidirectional replication. Optimization with MVs includes transparent query rewrite based on costbased selection method. The ability to rewrite a large class of queries based on a small set of MVs is supported by using Dimensions (new Oracle object), losslessness of joins, functional dependency, column equivalence, join derivability, joinback and aggregate rollup.

181 citations