Home
/
Authors
/
Arun N. Swami

Author

Arun N. Swami

Other affiliations: Stanford University, Xerox, LinkedIn ...read more

Bio: Arun N. Swami is an academic researcher from IBM. The author has contributed to research in topics: Query optimization & Association rule learning. The author has an hindex of 27, co-authored 38 publications receiving 21445 citations. Previous affiliations of Arun N. Swami include Stanford University & Xerox.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Mining association rules between sets of items in large databases

[...]

Rakesh Agrawal¹, Tomasz Imielinski², Arun N. Swami¹•Institutions (2)

IBM¹, Rutgers University²

01 Jun 1993

TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.

...read moreread less

Abstract: We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the effectiveness of the algorithm.

...read moreread less

15,645 citations

Book Chapter•DOI•

[...]

Rakesh Agrawal¹, Christos Faloutsos¹, Arun N. Swami¹•Institutions (1)

IBM¹

13 Oct 1993

TL;DR: An indexing method for time sequences for processing similarity queries using R * -trees to index the sequences and efficiently answer similarity queries and provides experimental results which show that the method is superior to search based on sequential scanning.

...read moreread less

Abstract: We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R * -trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1–3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.

...read moreread less

2,082 citations

Journal Article•DOI•

Database mining: a performance perspective

[...]

Rakesh Agrawal¹, Tomasz Imielinski¹, Arun N. Swami¹•Institutions (1)

IBM¹

01 Dec 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The authors' perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology is presented and an algorithm for classification obtained by combining the basic rule discovery operations is given.

...read moreread less

Abstract: The authors' perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology is presented. Three classes of database mining problems involving classification, associations, and sequences are described. It is argued that these problems can be uniformly viewed as requiring discovery of rules embedded in massive amounts of data. A model and some basic operations for the process of rule discovery are described. It is shown how the database mining problems considered map to this model, and how they can be solved by using the basic operations proposed. An example is given of an algorithm for classification obtained by combining the basic rule discovery operations. This algorithm is efficient in discovering classification rules and has accuracy comparable to ID3, one of the best current classifiers. >

...read moreread less

1,539 citations

Proceedings Article•DOI•

Clustering association rules

[...]

B. Lent¹, Arun N. Swami¹, Jennifer Widom¹•Institutions (1)

Stanford University¹

07 Apr 1997

TL;DR: The authors present a geometric-based algorithm, BitOp, for performing the clustering, embedded within an association rule clustering system, ARCS, and measure the quality of the segmentation generated by ARCS using the minimum description length (MDL) principle of encoding the clusters on several databases.

...read moreread less

Abstract: The authors consider the problem of clustering two-dimensional association rules in large databases. They present a geometric-based algorithm, BitOp, for performing the clustering, embedded within an association rule clustering system, ARCS. Association rule clustering is useful when the user desires to segment the data. They measure the quality of the segmentation generated by ARCS using the minimum description length (MDL) principle of encoding the clusters on several databases including noise and errors. Scale-up experiments show that ARCS, using the BitOp algorithm, scales linearly with the amount of data.

...read moreread less

419 citations

Proceedings Article•DOI•

Set-oriented mining for association rules in relational databases

[...]

Maurice A. W. Houtsma, Arun N. Swami

06 Mar 1995

TL;DR: This paper shows that at least some aspects of data mining can be carried out by using general query languages such as SQL, rather than by developing specialized black-box algorithms.

...read moreread less

Abstract: Describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss the optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. SETM uses only simple database primitives, viz. sorting and merge-scan join. SETM is simple, fast and stable over the range of parameter values. The major contribution of this paper is that it shows that at least some aspects of data mining can be carried out by using general query languages such as SQL, rather than by developing specialized black-box algorithms. The set-oriented nature of SETM facilitates the development of extensions. >

...read moreread less

320 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

Book•

Data Mining: Practical Machine Learning Tools and Techniques

[...]

Ian H. Witten, Eibe Frank, Mark Hall

25 Oct 1999

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

...read moreread less

Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

...read moreread less

20,196 citations

Proceedings Article•DOI•

Mining association rules between sets of items in large databases

[...]

Rakesh Agrawal¹, Tomasz Imielinski², Arun N. Swami¹•Institutions (2)

IBM¹, Rutgers University²

01 Jun 1993

...read moreread less

15,645 citations

Proceedings Article•

Fast algorithms for mining association rules

[...]

Rakesh Agrawal, Ramakrishnan Srikant

01 Jul 1998

TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.

...read moreread less

Abstract: We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving thii problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. We also show how the best features of the two proposed algorithms can be combined into a hybrid algorithm, called AprioriHybrid. Scale-up experiments show that AprioriHybrid scales linearly with the number of transactions. AprioriHybrid also has excellent scale-up properties with respect to the transaction size and the number of items in the database.

...read moreread less

10,863 citations

Proceedings Article•

Fast Algorithms for Mining Association Rules in Large Databases

[...]

Rakesh Agrawal, Ramakrishnan Srikant

12 Sep 1994

10,454 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse