scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Mining high dimensional association rules by generating large frequent k-dimension set

18 Jul 2012-pp 58-63
TL;DR: It was ascertained that the proposed method is proved to be better in support of generating association rules, and can obtain more rapid computing speed and sententious rules at the same time.
Abstract: Association rule mining aims at generating association rules between sets of items in a database. Now a day, due to huge accumulation in the database technology, the data are representing in the high dimensional data space. However, it is becoming very tedious to generate association rules from high dimensional data, because it contains different dimensions or attributes in the large data bases. In this paper, a method for generating association rules from large high dimensional data is proposed . It constitutes three steps, 1) pre-processing and generalizing the data base; 2) it generates large frequent k-dimension set using user supplied support value which is more feasible than the traditional approach; and 3) generating strong association rules using confidence. It can be seen from experiments that the mining algorithm is elegant and efficient, which can obtain more rapid computing speed and sententious rules at the same time It was ascertained that the proposed method is proved to be better in support of generating association rules.
Citations
More filters
Journal Article
TL;DR: In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.
Abstract: We explore the effect of dimensionality on the nearest neighbor problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!.

1,992 citations

Proceedings ArticleDOI
01 Aug 2016
TL;DR: A variant of Apriori algorithm is proposed using the concept of QR decomposition for reducing the dimensions thereby reducing the complexity of the traditional A Priori algorithm.
Abstract: Apriori is one of the best algorithms for learning association rules. Due to the explosion of data, the storage and retrieval mechanisms in various database paradigms have revolutionized the technologies and methodologies used in the architecture. As a result, the database is not only utilized for mere information retrieval but also to infer the analytical aspect of data. Therefore it is essential to find association rules from high dimensional data because the correlation amongst the attributes can help in gaining deeper insight into the data and help in decision making, recommendations as well as reorganizing the data for effective retrieval. The traditional Apriori algorithm is computationally expensive and infeasible with high dimensional datasets. Hence we propose a variant of Apriori algorithm using the concept of QR decomposition for reducing the dimensions thereby reducing the complexity of the traditional Apriori algorithm.

23 citations

Proceedings ArticleDOI
01 Nov 2014
TL;DR: This comprehensive study shows that the approach outperforms with traditional Apriori and obtains more rapid computing speed and at the same time generates Sententious Rules.
Abstract: At present, due to the developments in Database Technology, large volumes of data are produced by everyday operations and they have introduced the necessity of representing the data in High Dimensional Datasets. Discovering Frequent Determinant Patterns and Association Rules from these High Dimensional Datasets has become very tedious since these databases contain large number of different attributes. For the reason that, it generates extremely large number of redundant rules which makes the algorithms inefficient and it does not fit in main memory. In this paper, a new Association Rule Mining approach is presented, and it efficiently discovers Frequent Determinant Patterns and Association Rules from High Dimensional Datasets. The proposed approach adopts the conventional Apriori algorithm and device anew CApriori algorithm to prune the generated Frequent Determinant Sets effectively. A Frequent Determinant set is selected if its value is first compared with Conviction threshold value and then compared with Support threshold. This double comparison will eliminate the redundancy and generate strong Association Rules. To improve the mining process, this algorithm also makes use of a compressed data structure f_list constructed from feature attributes selected using Heuristic Fitness Function (HFF) and a Heuristic Discretization algorithm. It also makes use of Count Array (CA) devised as One Dimensional Triple Array pair set to minimize main memory utilization. This comprehensive study shows that the approach outperforms with traditional Apriori and obtains more rapid computing speed and at the same time generates Sententious Rules. Further the mining methodology is ascertained to be better in generating strong Association Rules from High Dimensional Databases.

12 citations

Journal ArticleDOI
TL;DR: An information theoretic method together with the concept of QR decomposition is employed to represent the data in its proper substructure form without losing its semantics, by identifying significant attributes in large and high dimensional datasets.
Abstract: This paper presents a new computational approach to discover interesting relations between variables, called association rules, in large and high dimensional datasets. State-of-the-art techniques are computationally expensive due to reasons like high dimensions, generation of huge number of candidate sets and multiple database scans. In general, most of the enormous discovered patterns are obvious, redundant or uninteresting to the user. So the context of this paper is to improve apriori algorithm to find association rules pertaining to only important attributes from high dimensional data. We employ an information theoretic method together with the concept of QR decomposition to represent the data in its proper substructure form without losing its semantics, by identifying significant attributes. Experiment on real datasets and comparison with the existing technique reveals that the proposed strategy is computationally always faster and statistically always comparable with the apriori algorithms in terms of rules generated and time complexity.

4 citations

Proceedings ArticleDOI
01 Nov 2015
TL;DR: The novel FP-Table algorithm is proposed in this paper to solve the massive transaction mining frequency pattern problem and the two optimization methods of compression table and twice scanning database are proposed to improve the algorithm efficiency due to the massive sparse data occupation in FP- table.
Abstract: For massive and various trading data, transaction mining algorithm is very useful to find the relationship of correlative elements. The novel FP-Table algorithm is proposed in this paper to solve the massive transaction mining frequency pattern problem. The FP-Table algorithm integrates the Hash table into FP-Growth algorithm, using two-dimension table recording the frequency count of item pair, then building the Hash-T table and FP-Table. And the two optimization methods of compression table and twice scanning database are proposed to improve the algorithm efficiency due to the massive sparse data occupation in FP-Table. A case study and experiment of transaction mining in mobile electricity market is implemented to prove that the FP-Table optimization algorithm is efficient for application value by performance of experiment results.

Cites methods from "Mining high dimensional association..."

  • ...Prasanna, and Seetha [7] present a method for generating association rules from large high dimensional data, which can obtain more rapid computing speed and sententious rules....

    [...]

References
More filters
Proceedings ArticleDOI
01 Jun 1993
TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Abstract: We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the effectiveness of the algorithm.

15,645 citations

Proceedings Article
01 Jul 1998
TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.
Abstract: We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving thii problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. We also show how the best features of the two proposed algorithms can be combined into a hybrid algorithm, called AprioriHybrid. Scale-up experiments show that AprioriHybrid scales linearly with the number of transactions. AprioriHybrid also has excellent scale-up properties with respect to the transaction size and the number of items in the database.

10,863 citations

Book ChapterDOI
10 Jan 1999
TL;DR: The effect of dimensionality on the "nearest neighbor" problem is explored, and it is shown that under a broad set of conditions, as dimensionality increases, the Distance to the nearest data point approaches the distance to the farthest data point.
Abstract: We explore the effect of dimensionality on the "nearest neighbor" problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!

2,012 citations


"Mining high dimensional association..." refers background in this paper

  • ...This work was subsequently extended to finding association rules over multidimensional dataset....

    [...]

Journal Article
TL;DR: In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.
Abstract: We explore the effect of dimensionality on the nearest neighbor problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!.

1,992 citations

Proceedings ArticleDOI
01 Jun 1996
TL;DR: This work deals with quantitative attributes by fine-partitioning the values of the attribute and then combining adjacent partitions as necessary and introduces measures of partial completeness which quantify the information lost due to partitioning.
Abstract: We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by fine-partitioning the values of the attribute and then combining adjacent partitions as necessary. We introduce measures of partial completeness which quantify the information lost due to partitioning. A direct application of this technique can generate too many similar rules. We tackle this problem by using a "greater-than-expected-value" interest measure to identify the interesting rules in the output. We give an algorithm for mining such quantitative association rules. Finally, we describe the results of using this approach on a real-life dataset.

1,697 citations


"Mining high dimensional association..." refers background in this paper

  • ...It requires a common data representation known as high dimensional data set....

    [...]