Efficient algorithms for mining outliers from large data sets

doi:10.1145/335191.335437

Journal ArticleDOI

Efficient algorithms for mining outliers from large data sets

Sridhar Ramaswamy, +2 more

- Vol. 29, Iss: 2, pp 427-438

Chats0

TLDR

A novel formulation for distance-based outliers that is based on the distance of a point from its kth nearest neighbor is proposed and the top n points in this ranking are declared to be outliers.

Abstract:

In this paper, we propose a novel formulation for distance-based outliers that is based on the distance of a point from its kth nearest neighbor. We rank each point on the basis of its distance to its kth nearest neighbor and declare the top n points in this ranking to be outliers. In addition to developing relatively straightforward solutions to finding such outliers based on the classical nested-loop join and index join algorithms, we develop a highly efficient partition-based algorithm for mining outliers. This algorithm first partitions the input data set into disjoint subsets, and then prunes entire partitions as soon as it is determined that they cannot contain outliers. This results in substantial savings in computation. We present the results of an extensive experimental study on real-life and synthetic data sets. The results from a real-life NBA database highlight and reveal several expected and unexpected aspects of the database. The results from a study on synthetic data sets demonstrate that the partition-based algorithm scales well with respect to both data set size and data set dimensionality.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Anomaly detection: A survey

Varun Chandola, +2 more

- 30 Jul 2009 -

ACM Computing Surveys

TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.

...read moreread less

Journal ArticleDOI

LOF: identifying density-based local outliers

Markus M. Breunig, +3 more

TL;DR: This paper contends that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier, called the local outlier factor (LOF), and gives a detailed formal analysis showing that LOF enjoys many desirable properties.

...read moreread less

Journal ArticleDOI

A Survey of Outlier Detection Methodologies

Victoria J. Hodge, +1 more

- 01 Oct 2004 -

Artificial Intelligence Review

TL;DR: A survey of contemporary techniques for outlier detection is introduced and their respective motivations are identified and distinguish their advantages and disadvantages in a comparative review.

...read moreread less

Book ChapterDOI

A Survey of Clustering Data Mining Techniques

Pavel Berkhin

TL;DR: This survey concentrates on clustering algorithms from a data mining perspective as a data modeling technique that provides for concise summaries of the data.

...read moreread less

Journal ArticleDOI

An overview of anomaly detection techniques: Existing solutions and latest technological trends

Animesh Patcha, +1 more

- 01 Aug 2007 -

Computer Networks

TL;DR: This paper provides a comprehensive survey of anomaly detection systems and hybrid intrusion detection systems of the recent past and present and discusses recent technological trends in anomaly detection and identifies open problems and challenges in this area.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Algorithms for clustering data

Anil K. Jain, +1 more

Book

Algorithms for clustering data

Anil K. Jain, +1 more

Computational geometry. an introduction

Franco P. Preparata, +1 more

TL;DR: This book offers a coherent treatment, at the graduate textbook level, of the field that has come to be known in the last decade or so as computational geometry.

...read moreread less

Journal ArticleDOI

LOF: identifying density-based local outliers

Markus M. Breunig, +3 more

TL;DR: This paper contends that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier, called the local outlier factor (LOF), and gives a detailed formal analysis showing that LOF enjoys many desirable properties.

...read moreread less

Proceedings ArticleDOI

The R*-tree: an efficient and robust access method for points and rectangles

Norbert Beckmann, +3 more

TL;DR: The R*-tree is designed which incorporates a combined optimization of area, margin and overlap of each enclosing rectangle in the directory which clearly outperforms the existing R-tree variants.

...read moreread less

Efficient algorithms for mining outliers from large data sets

Citations

Anomaly detection: A survey

LOF: identifying density-based local outliers

A Survey of Outlier Detection Methodologies

A Survey of Clustering Data Mining Techniques

An overview of anomaly detection techniques: Existing solutions and latest technological trends

References

Algorithms for clustering data

Algorithms for clustering data

Computational geometry. an introduction

LOF: identifying density-based local outliers

The R*-tree: an efficient and robust access method for points and rectangles

Related Papers (5)

LOF: identifying density-based local outliers

Algorithms for Mining Distance-Based Outliers in Large Datasets

Anomaly detection: A survey

Outliers in Statistical Data

Identification of outliers

Trending Questions (1)