scispace - formally typeset
Journal ArticleDOI

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

Reads0
Chats0
TLDR
An extensive experimental study on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose, and provides a characterization of the datasets themselves.
Abstract
The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate benchmark datasets with ground truth annotation is a significant impediment to the evaluation of outlier methods. Even when labeled datasets are available, their suitability for the outlier detection task is typically unknown. Furthermore, the biases of commonly-used evaluation measures are not fully understood. It is thus difficult to ascertain the extent to which newly-proposed outlier detection methods improve over established methods. In this paper, we perform an extensive experimental study on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose. Based on the overall performance of the outlier detection methods, we provide a characterization of the datasets themselves, and discuss their suitability as outlier detection benchmark sets. We also examine the most commonly-used measures for comparing the performance of different methods, and suggest adaptations that are more suitable for the evaluation of outlier detection results.

read more

Citations
More filters
Journal ArticleDOI

Deep Learning for Anomaly Detection: A Review

TL;DR: A comprehensive survey of deep anomaly detection with a comprehensive taxonomy is presented in this paper, covering advancements in 3 high-level categories and 11 fine-grained categories of the methods.
Journal ArticleDOI

Deep Learning for Anomaly Detection: A Review

TL;DR: This article surveys the research of deep anomaly detection with a comprehensive taxonomy, covering advancements in 3 high-level categories and 11 fine-grained categories of the methods and discusses how they address the aforementioned challenges.
Journal Article

Ranking outliers using symmetric neighborhood relationship

TL;DR: In this article, the authors proposed a measure on local outliers based on a symmetric neighborhood relationship, which considers both neighbors and reverse neighbors of an object when estimating its density distribution.
Journal ArticleDOI

A Unifying Review of Deep and Shallow Anomaly Detection

TL;DR: This review aims to identify the common underlying principles and the assumptions that are often made implicitly by various methods in deep learning, and draws connections between classic “shallow” and novel deep approaches and shows how this relation might cross-fertilize or extend both directions.
Journal ArticleDOI

Progress in Outlier Detection Techniques: A Survey

TL;DR: This survey presents a comprehensive and organized review of the progress of outlier detection methods from 2000 to 2019 and categorizes them into different techniques from diverse outlier Detection techniques, such as distance-, clustering-, density-, ensemble-, and learning-based methods.
References
More filters
Journal ArticleDOI

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

James A. Hanley, +1 more
- 01 Apr 1982 - 
TL;DR: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented and it is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a random chosen non-diseased subject.
Journal Article

Statistical Comparisons of Classifiers over Multiple Data Sets

TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.
Journal ArticleDOI

Anomaly detection: A survey

TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.
Journal ArticleDOI

LOF: identifying density-based local outliers

TL;DR: This paper contends that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier, called the local outlier factor (LOF), and gives a detailed formal analysis showing that LOF enjoys many desirable properties.