On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

doi:10.1007/S10618-015-0444-8

Journal ArticleDOI

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

Guilherme Oliveira Campos, +7 more

- 01 Jul 2016 -

Data Mining and Knowledge Discovery

- Vol. 30, Iss: 4, pp 891-927

Chats0

TLDR

An extensive experimental study on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose, and provides a characterization of the datasets themselves.

Abstract:

The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate benchmark datasets with ground truth annotation is a significant impediment to the evaluation of outlier methods. Even when labeled datasets are available, their suitability for the outlier detection task is typically unknown. Furthermore, the biases of commonly-used evaluation measures are not fully understood. It is thus difficult to ascertain the extent to which newly-proposed outlier detection methods improve over established methods. In this paper, we perform an extensive experimental study on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose. Based on the overall performance of the outlier detection methods, we provide a characterization of the datasets themselves, and discuss their suitability as outlier detection benchmark sets. We also examine the most commonly-used measures for comparing the performance of different methods, and suggest adaptations that are more suitable for the evaluation of outlier detection results.

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

Citations

Deep Learning for Anomaly Detection: A Review

Deep Learning for Anomaly Detection: A Review

Ranking outliers using symmetric neighborhood relationship

A Unifying Review of Deep and Shallow Anomaly Detection

Progress in Outlier Detection Techniques: A Survey

References

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

UCI Machine Learning Repository

Statistical Comparisons of Classifiers over Multiple Data Sets

Anomaly detection: A survey

LOF: identifying density-based local outliers

Related Papers (5)

LOF: identifying density-based local outliers

Anomaly detection: A survey

Efficient algorithms for mining outliers from large data sets

Isolation Forest

Outlier Analysis