scispace - formally typeset
Open AccessJournal Article

Ranking outliers using symmetric neighborhood relationship

Reads0
Chats0
TLDR
In this article, the authors proposed a measure on local outliers based on a symmetric neighborhood relationship, which considers both neighbors and reverse neighbors of an object when estimating its density distribution.
Abstract
Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Outlier Analysis

TL;DR: Outlier Analysis is a comprehensive exposition, as understood by data mining experts, statisticians and computer scientists, and emphasis was placed on simplifying the content, so that students and practitioners can also benefit.
Journal ArticleDOI

Neighborhood rough set based heterogeneous feature subset selection

TL;DR: A neighborhood rough set model is introduced to deal with the problem of heterogeneous feature subset selection and Experimental results show that the neighborhood model based method is more flexible to deals with heterogeneous data.
Journal ArticleDOI

A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data.

TL;DR: This paper aims to be a new well-funded basis for unsupervised anomaly detection research by publishing the source code and the datasets, and reveals the strengths and weaknesses of the different approaches for the first time.
Proceedings ArticleDOI

Angle-based outlier detection in high-dimensional data

TL;DR: This paper proposes a novel approach named ABOD (Angle-Based Outlier Detection) and some variants assessing the variance in the angles between the difference vectors of a point to the other points and shows ABOD to perform especially well on high-dimensional data.
Journal ArticleDOI

A survey on unsupervised outlier detection in high-dimensional numerical data

TL;DR: This survey article discusses some important aspects of the ‘curse of dimensionality’ in detail and surveys specialized algorithms for outlier detection from both categories.
References
More filters
Book

Data Mining: Concepts and Techniques

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Proceedings Article

A density-based algorithm for discovering clusters in large spatial Databases with Noise

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
Proceedings ArticleDOI

CURE: an efficient clustering algorithm for large databases

TL;DR: This work proposes a new clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size, and demonstrates that random sampling and partitioning enable CURE to not only outperform existing algorithms but also to scale well for large databases without sacrificing clustering quality.
Proceedings Article

Efficient and Effective Clustering Methods for Spatial Data Mining

TL;DR: The analysis and experiments show that with the assistance of CLAHANS, these two algorithms are very effective and can lead to discoveries that are difficult to find with current spatial data mining algorithms.