Mining generalized association rules

doi:10.1016/S0167-739X(97)00019-8

Home
/
Papers
/
Mining generalized association rules

Journal Article•DOI•

Mining generalized association rules

Ramakrishnan Srikant¹, Rakesh Agrawal¹•Institutions (1)

IBM¹

01 Nov 1997-Future Generation Computer Systems (Elsevier Science Publishers B. V.)-Vol. 13, Iss: 2, pp 161-180

TL;DR: A new interest-measure for rules which uses the information in the taxonomy is presented, and given a user-specified “minimum-interest-level”, this measure prunes a large number of redundant rules.

read less

About: This article is published in Future Generation Computer Systems.The article was published on 1997-11-01. It has received 1790 citations till now. The article focuses on the topics: Association rule learning & Database transaction.

...read moreread less

Citations

PDF

Open Access

More filters

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

Journal Article•DOI•

Top 10 algorithms in data mining

[...]

Xindong Wu¹, Vipin Kumar², J. Ross Quinlan, Joydeep Ghosh³, Qiang Yang⁴, Hiroshi Motoda⁵, Geoffrey J. McLachlan⁶, Angus S. K. Ng⁷, Bing Liu⁸, Philip S. Yu⁹, Zhi-Hua Zhou¹⁰, Michael Steinbach², David J. Hand¹¹, Dan Steinberg¹² - Show less +10 more•Institutions (12)

University of Vermont¹, University of Minnesota², University of Texas at Austin³, Hong Kong University of Science and Technology⁴, Osaka University⁵, University of Queensland⁶, Griffith University⁷, University of Illinois at Chicago⁸, IBM⁹, Nanjing University¹⁰, Imperial College London¹¹, University of Salford¹²

19 Dec 2007-Knowledge and Information Systems

TL;DR: This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART.

...read moreread less

Abstract: This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development.

...read moreread less

4,944 citations

Cites background from "Mining generalized association rule..."

...The major ones include the followings: (1) incorporating taxonomy in items [72]: Use of taxonomy makes it possible to extract frequent itemsets that are expressed by higher concepts even when use of the base level concepts produces only infrequent itemsets....
[...]

Posted Content•

Principles of data mining

[...]

David J. Hand, Heikki Mannila, Padhraic Smyth

01 Jan 2001

TL;DR: This paper gives a lightning overview of data mining and its relation to statistics, with particular emphasis on tools for the detection of adverse drug reactions.

...read moreread less

Abstract: The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.

...read moreread less

3,765 citations

Book Chapter•DOI•

Mining Sequential Patterns: Generalizations and Performance Improvements

[...]

Ramakrishnan Srikant¹, Ramakrishnan Srikant², Rakesh Agrawal¹•Institutions (2)

IBM¹, University of Wisconsin-Madison²

25 Mar 1996

TL;DR: This work adds time constraints that specify a minimum and/or maximum time period between adjacent elements in a pattern, and relax the restriction that the items in an element of a sequential pattern must come from the same transaction.

...read moreread less

Abstract: The problem of mining sequential patterns was recently introduced in [3] We are given a database of sequences, where each sequence is a list of transactions ordered by transaction-time, and each transaction is a set of items The problem is to discover all sequential patterns with a user-specified minimum support, where the support of a pattern is the number of data-sequences that contain the pattern An example of a sequential pattern is“5% of customers bought ‘Foundation’ and ‘Ringworld’ in one transaction, followed by ‘Second Foundation’ in a later transaction” We generalize the problem as follows First, we add time constraints that specify a minimum and/or maximum time period between adjacent elements in a pattern Second, we relax the restriction that the items in an element of a sequential pattern must come from the same transaction, instead allowing the items to be present in a set of transactions whose transaction-times are within a user-specified time window Third, given a user-defined taxonomy (is-a hierarchy) on items, we allow sequential patterns to include items across all levels of the taxonomy

...read moreread less

2,973 citations

Cites background from "Mining generalized association rule..."

...The interest measure introduced in [6] also carries over and can be used to prune such redundant patterns....
[...]
...The problem of nding association rules when there is a user-de ned taxonomy on items has been addressed in [6] [4]....
[...]
...The ideas presented in [6] for discovering association rules with taxonomies carry over to the current problem....
[...]

Data Mining: Concepts and Techniques (2nd edition)

[...]

Jiawei Han, Micheline Kamber

01 Jan 2006

TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].

...read moreread less

Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

...read moreread less

2,591 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Mining association rules between sets of items in large databases

[...]

Rakesh Agrawal¹, Tomasz Imielinski², Arun N. Swami¹•Institutions (2)

IBM¹, Rutgers University²

01 Jun 1993

TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.

...read moreread less

Abstract: We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the effectiveness of the algorithm.

...read moreread less

15,645 citations

"Mining generalized association rule..." refers background or methods in this paper

...We can then run any of the algorithms for mining association rules [1] [2] [5] [6] [7] on the extended transactions to get generalized association rules....
[...]
...Earlier work on association rules [1] [2] [5] [6] [7] did not consider the presence of taxonomies and restricted the items in association rules to the leaf-level items in the taxonomy....
[...]
...Given the frequent itemsets, the algorithm in [1] [2] can be used to generate rules....
[...]
...In earlier papers [1] [2], itemsets with minimum support were called large itemsets....
[...]
...the taxonomy T , a possibility not entertained by the formalism introduced in [1]....
[...]

Proceedings Article•

Fast algorithms for mining association rules

[...]

Rakesh Agrawal, Ramakrishnan Srikant

01 Jul 1998

TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.

...read moreread less

Abstract: We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving thii problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. We also show how the best features of the two proposed algorithms can be combined into a hybrid algorithm, called AprioriHybrid. Scale-up experiments show that AprioriHybrid scales linearly with the number of transactions. AprioriHybrid also has excellent scale-up properties with respect to the transaction size and the number of items in the database.

...read moreread less

10,863 citations

Proceedings Article•

Fast Algorithms for Mining Association Rules in Large Databases

[...]

Rakesh Agrawal, Ramakrishnan Srikant

12 Sep 1994

10,454 citations

Book•

The Probabilistic Method

[...]

Joel Spencer

01 Jan 1991

TL;DR: A particular set of problems - all dealing with “good” colorings of an underlying set of points relative to a given family of sets - is explored.

...read moreread less

Abstract: The use of randomness is now an accepted tool in Theoretical Computer Science but not everyone is aware of the underpinnings of this methodology in Combinatorics - particularly, in what is now called the probabilistic Method as developed primarily by Paul Erdoős over the past half century. Here I will explore a particular set of problems - all dealing with “good” colorings of an underlying set of points relative to a given family of sets. A central point will be the evolution of these problems from the purely existential proofs of Erdős to the algorithmic aspects of much interest to this audience.

...read moreread less

6,594 citations