Home
/
Authors
/
Sridhar Ramaswamy

Author

Sridhar Ramaswamy

Other affiliations: Telcordia Technologies

Bio: Sridhar Ramaswamy is an academic researcher from Alcatel-Lucent. The author has contributed to research in topics: Association rule learning & Data set. The author has an hindex of 8, co-authored 10 publications receiving 2004 citations. Previous affiliations of Sridhar Ramaswamy include Telcordia Technologies.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Efficient algorithms for mining outliers from large data sets

[...]

Sridhar Ramaswamy, Rajeev Rastogi¹, Kyuseok Shim²•Institutions (2)

Bell Labs¹, KAIST²

16 May 2000

TL;DR: A novel formulation for distance-based outliers that is based on the distance of a point from its kth nearest neighbor is proposed and the top n points in this ranking are declared to be outliers.

...read moreread less

Abstract: In this paper, we propose a novel formulation for distance-based outliers that is based on the distance of a point from its kth nearest neighbor. We rank each point on the basis of its distance to its kth nearest neighbor and declare the top n points in this ranking to be outliers. In addition to developing relatively straightforward solutions to finding such outliers based on the classical nested-loop join and index join algorithms, we develop a highly efficient partition-based algorithm for mining outliers. This algorithm first partitions the input data set into disjoint subsets, and then prunes entire partitions as soon as it is determined that they cannot contain outliers. This results in substantial savings in computation. We present the results of an extensive experimental study on real-life and synthetic data sets. The results from a real-life NBA database highlight and reveal several expected and unexpected aspects of the database. The results from a study on synthetic data sets demonstrate that the partition-based algorithm scales well with respect to both data set size and data set dimensionality.

...read moreread less

1,871 citations

Journal Article•DOI•

Selectivity estimation in spatial databases

[...]

Swarup Acharya¹, Viswanath Poosala¹, Sridhar Ramaswamy¹•Institutions (1)

Alcatel-Lucent¹

01 Jun 1999

TL;DR: This paper examines selectivity estimation in the context of Geographic Information Systems, which manage spatial data such as points, lines, poly-lines and polygons, and identifies a BSP based partitioning that is consistently provides the most accurate selectivity estimates for spatial queries.

...read moreread less

Abstract: Selectivity estimation of queries is an important and well-studied problem in relational database systems. In this paper, we examine selectivity estimation in the context of Geographic Information Systems, which manage spatial data such as points, lines, poly-lines and polygons. In particular, we focus on point and range queries over two-dimensional rectangular data. We propose several techniques based on using spatial indices, histograms, binary space partitionings (BSPs), and the novel notion of spatial skew. Our techniques carefully partition the input rectangles into subsets and approximate each partition accurately. We present a detailed experimental study comparing the proposed techniques and the best known sampling and parametric techniques. We evaluate them using synthetic as well as real-life TIGER datasets. Based on our experiments, we identify a BSP based partitioning that we call Min-Skew which consistently provides the most accurate selectivity estimates for spatial queries. The Min-Skew partitioning can be constructed efficiently, occupies very little space, and provides accurate selectivity estimates over a broad range of spatial queries.

...read moreread less

235 citations

Proceedings Article•DOI•

OODB indexing by class-division

[...]

Sridhar Ramaswamy¹, Paris C. Kanellakis²•Institutions (2)

Telcordia Technologies¹, Brown University²

22 May 1995

TL;DR: This work has developed a technique, called indexing by class-division (CD), which it is believed can be used as a practical alternative to CH and an optimized implementation and experimental validation of CD's average-case performance are presented.

...read moreread less

Abstract: Indexing a class hierarchy, in order to efficiently search or update the objects of a class according to a (range of) value(s) of an attribute, impacts OODB performance heavily. For this indexing problem, most systems use the class hierarchy index (CH) technique of [15] implemented using B+-trees. Other techniques, such as those of [14, 18,31], can lead to improved average-case performance but involve the implementation of new data-structures. As a special form of external dynamic two-dimensional range searching, this OODB indexing problem is solvable within reasonable worst-case bounds [12]. Based on this insight, we have developed a technique, called indexing by class-division (CD), which we believe can be used as a practical alternative to CH. We present an optimized implementation and experimental validation of CD's average-case performance. The main advantages of the CD technique are: (1) CD is an extension of CH that provides a significant speed-up over CH for a wide spectrum of range queries--this speed-up is at least linear in the number of classes queried for uniform data and larger otherwise; and (2) CD queries, updates and concurrent use are implementable using existing B+-tree technology. The basic idea of class-division involves a time-space tradeoff and CD requires some space and update overhead in comparison to CH. In practice, this overhead is a small factor (2 to 3) and, in worst-case, is bounded by the depth of the hierarchy and the logarithm of its size.

...read moreread less

49 citations

Patent•

Data mining using cyclic association rules

[...]

Banu Ozden¹, Sridhar Ramaswamy¹, Abraham Silberschatz¹•Institutions (1)

Alcatel-Lucent¹

16 Feb 1999

TL;DR: In this paper, a system and method for discovering association rules that display regular cyclic variation over time is disclosed. But the method is based on the interaction between association rules and time, which reduces the amount of time needed to find cyclic association rules.

...read moreread less

Abstract: A system and method for discovering association rules that display regular cyclic variation over time is disclosed. Such association rules may apply over daily, weekly or monthly (or other) cycles of sales data or the like. A first technique, referred to as the sequential algorithm, treats association rules and cycles relatively independently. Based on the interaction between association rules and time, we employ a new technique called cycle pruning, which reduces the amount of time needed to find cyclic association rules. A second algorithm, the interleaved algorithm, uses cycle pruning and other optimization techniques for discovering cyclic association rules with reduced overhead.

...read moreread less

26 citations

Patent•

Method for identifying outliers in large data sets

[...]

Sridhar Ramaswamy¹, Rajeev Rastogi¹, Kyuseok Shim¹•Institutions (1)

Alcatel-Lucent¹

18 Nov 1999

TL;DR: In this article, a new method for identifying a predetermined number of data points of interest in a large data set is proposed, which is ranked in relation to the distance to their neighboring points.

...read moreread less

Abstract: A new method for identifying a predetermined number of data points of interest in a large data set. The data points of interest are ranked in relation to the distance to their neighboring points. The method employs partition-based detection algorithms to partition the data points and then compute upper and lower bounds for each partition. These bounds are then used to eliminate those partitions that do contain the predetermined number of data points of interest. The data points of interest are then computed from the remaining partitions that were not eliminated. The present method eliminates a significant number of data points from consideration as the points of interest, thereby resulting in substantial savings in computational expense compared to conventional methods employed to identify such points.

...read moreread less

18 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Anomaly detection: A survey

[...]

Varun Chandola¹, Arindam Banerjee¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

30 Jul 2009-ACM Computing Surveys

TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.

...read moreread less

Abstract: Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and more succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.

...read moreread less

9,627 citations

Journal Article•DOI•

LOF: identifying density-based local outliers

[...]

Markus M. Breunig¹, Hans-Peter Kriegel¹, Raymond T. Ng², Jörg Sander¹•Institutions (2)

Ludwig Maximilian University of Munich¹, University of British Columbia²

16 May 2000

TL;DR: This paper contends that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier, called the local outlier factor (LOF), and gives a detailed formal analysis showing that LOF enjoys many desirable properties.

...read moreread less

Abstract: For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier. This degree is called the local outlier factor (LOF) of an object. It is local in that the degree depends on how isolated the object is with respect to the surrounding neighborhood. We give a detailed formal analysis showing that LOF enjoys many desirable properties. Using real-world datasets, we demonstrate that LOF can be used to find outliers which appear to be meaningful, but can otherwise not be identified with existing approaches. Finally, a careful performance evaluation of our algorithm confirms we show that our approach of finding local outliers can be practical.

...read moreread less

5,248 citations

Journal Article•DOI•

A Survey of Outlier Detection Methodologies

[...]

Victoria J. Hodge¹, Jim Austin¹•Institutions (1)

University of York¹

01 Oct 2004-Artificial Intelligence Review

TL;DR: A survey of contemporary techniques for outlier detection is introduced and their respective motivations are identified and distinguish their advantages and disadvantages in a comparative review.

...read moreread less

Abstract: Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.

...read moreread less

3,235 citations

Book Chapter•DOI•

A Survey of Clustering Data Mining Techniques

[...]

Pavel Berkhin¹•Institutions (1)

Yahoo!¹

01 Jan 2006

TL;DR: This survey concentrates on clustering algorithms from a data mining perspective as a data modeling technique that provides for concise summaries of the data.

...read moreread less

Abstract: Clustering is the division of data into groups of similar objects. In clustering, some details are disregarded in exchange for data simplification. Clustering can be viewed as a data modeling technique that provides for concise summaries of the data. Clustering is therefore related to many disciplines and plays an important role in a broad range of applications. The applications of clustering usually deal with large datasets and data with many attributes. Exploration of such data is a subject of data mining. This survey concentrates on clustering algorithms from a data mining perspective.

...read moreread less

3,047 citations

Journal Article•DOI•

An overview of anomaly detection techniques: Existing solutions and latest technological trends

[...]

Animesh Patcha¹, Jung-Min Park¹•Institutions (1)

Virginia Tech¹

01 Aug 2007-Computer Networks

TL;DR: This paper provides a comprehensive survey of anomaly detection systems and hybrid intrusion detection systems of the recent past and present and discusses recent technological trends in anomaly detection and identifies open problems and challenges in this area.

...read moreread less

1,433 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse