Ranking outliers using symmetric neighborhood relationship

Open AccessJournal Article

Ranking outliers using symmetric neighborhood relationship

Wen Jin, +3 more

- 01 Jan 2006 -

Lecture Notes in Computer Science

- pp 577-593

Chats0

TLDR

In this article, the authors proposed a measure on local outliers based on a symmetric neighborhood relationship, which considers both neighbors and reverse neighbors of an object when estimating its density distribution.

Abstract:

Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.

Citations

PDF

Open Access

More filters

Book

Outlier Analysis

Charu C. Aggarwal

TL;DR: Outlier Analysis is a comprehensive exposition, as understood by data mining experts, statisticians and computer scientists, and emphasis was placed on simplifying the content, so that students and practitioners can also benefit.

...read moreread less

Journal ArticleDOI

Neighborhood rough set based heterogeneous feature subset selection

Qinghua Hu, +3 more

- 20 Sep 2008 -

Information Sciences

TL;DR: A neighborhood rough set model is introduced to deal with the problem of heterogeneous feature subset selection and Experimental results show that the neighborhood model based method is more flexible to deals with heterogeneous data.

...read moreread less

Journal ArticleDOI

A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data.

Markus Goldstein, +1 more

- 19 Apr 2016 -

PLOS ONE

TL;DR: This paper aims to be a new well-funded basis for unsupervised anomaly detection research by publishing the source code and the datasets, and reveals the strengths and weaknesses of the different approaches for the first time.

...read moreread less

Proceedings ArticleDOI

Angle-based outlier detection in high-dimensional data

Hans-Peter Kriegel, +2 more

TL;DR: This paper proposes a novel approach named ABOD (Angle-Based Outlier Detection) and some variants assessing the variance in the angles between the difference vectors of a point to the other points and shows ABOD to perform especially well on high-dimensional data.

...read moreread less

Journal ArticleDOI

A survey on unsupervised outlier detection in high-dimensional numerical data

Arthur Zimek, +2 more

- 01 Oct 2012 -

Statistical Analysis and Data Mining

TL;DR: This survey article discusses some important aspects of the ‘curse of dimensionality’ in detail and surveys specialized algorithms for outlier detection from both categories.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Data Mining: Concepts and Techniques

Jiawei Han, +2 more

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

Martin Ester, +3 more

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.

...read moreread less

Proceedings Article

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Martin Ester, +3 more

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

...read moreread less

Proceedings ArticleDOI

CURE: an efficient clustering algorithm for large databases

Sudipto Guha, +2 more

TL;DR: This work proposes a new clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size, and demonstrates that random sampling and partitioning enable CURE to not only outperform existing algorithms but also to scale well for large databases without sacrificing clustering quality.

...read moreread less

Proceedings Article

Efficient and Effective Clustering Methods for Spatial Data Mining

Raymond T. Ng, +1 more

TL;DR: The analysis and experiments show that with the assistance of CLAHANS, these two algorithms are very effective and can lead to discoveries that are difficult to find with current spatial data mining algorithms.

...read moreread less

Collapse

Ranking outliers using symmetric neighborhood relationship

Citations

Outlier Analysis

Neighborhood rough set based heterogeneous feature subset selection

A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data.

Angle-based outlier detection in high-dimensional data

A survey on unsupervised outlier detection in high-dimensional numerical data

References

Data Mining: Concepts and Techniques

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

A density-based algorithm for discovering clusters in large spatial Databases with Noise

CURE: an efficient clustering algorithm for large databases

Efficient and Effective Clustering Methods for Spatial Data Mining

Related Papers (5)

Toward accurate and efficient outlier detection in high dimensional and large data sets

New Approach Based on Square Neighborhood to Detect Outliers

Learning to Locate Relative Outliers

Efficiently mining regional outliers in spatial data

A graph-based approach to detect abnormal spatial points and regions