scispace - formally typeset
Open AccessProceedings Article

Robust random cut forest based anomaly detection on streams

TLDR
A robust random cut data structure that can be used as a sketch or synopsis of the input stream is investigated and it is shown how the sketch can be efficiently updated in a dynamic data stream.
Abstract
In this paper we focus on the anomaly detection problem for dynamic data streams through the lens of random cut forests. We investigate a robust random cut data structure that can be used as a sketch or synopsis of the input stream. We provide a plausible definition of non-parametric anomalies based on the influence of an unseen point on the remainder of the data, i.e., the externality imposed by that point. We show how the sketch can be efficiently updated in a dynamic data stream. We demonstrate the viability of the algorithm on publicly available real data.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A Unifying Review of Deep and Shallow Anomaly Detection

TL;DR: This review aims to identify the common underlying principles and the assumptions that are often made implicitly by various methods in deep learning, and draws connections between classic “shallow” and novel deep approaches and shows how this relation might cross-fertilize or extend both directions.
Journal ArticleDOI

Real-time big data processing for anomaly detection: A Survey

TL;DR: This paper begins with the explanation of essential contexts and taxonomy of real-time big dataprocessing, anomalous detection, and machine learning algorithms, followed by the review of big data processing technologies.
Journal ArticleDOI

A Unifying Review of Deep and Shallow Anomaly Detection

TL;DR: Deep learning approaches to anomaly detection (AD) have recently improved the state of the art in detection performance on complex data sets, such as large collections of images or text as mentioned in this paper, and led to the introduction of a great variety of new methods.
Journal ArticleDOI

Machine learning for streaming data: state of the art, challenges, and opportunities

TL;DR: Incremental learning, online learning, and data stream learning are terms commonly associated with learning algorithms that update their models given a continuous influx of data without performing any act of reinforcement learning.
Proceedings ArticleDOI

SpotLight: Detecting Anomalies in Streaming Graphs

TL;DR: A randomized sketching-based approach called SpotLight is proposed, which guarantees that an anomalous graph is mapped 'far' away from 'normal' instances in the sketch space with high probability for appropriate choice of parameters.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Proceedings Article

A density-based algorithm for discovering clusters in large spatial Databases with Noise

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
Journal ArticleDOI

Anomaly detection: A survey

TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.
Proceedings ArticleDOI

R-trees: a dynamic index structure for spatial searching

TL;DR: A dynamic index structure called an R-tree is described which meets this need, and algorithms for searching and updating it are given and it is concluded that it is useful for current database systems in spatial applications.