scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Graph based anomaly detection and description: a survey

TL;DR: This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs, and gives a general framework for the algorithms categorized under various settings.
Abstract: Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised versus (semi-)supervised approaches, for static versus dynamic graphs, for attributed versus plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the `why', of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field.
Citations
More filters
Journal ArticleDOI
TL;DR: This survey aims to document the state of anomaly detection in high dimensional big data by representing the unique challenges using a triangular model of vertices: the problem, techniques/algorithms, and tools (big data applications/frameworks).
Abstract: Anomaly detection in high dimensional data is becoming a fundamental research problem that has various applications in the real world However, many existing anomaly detection techniques fail to retain sufficient accuracy due to so-called “big data” characterised by high-volume, and high-velocity data generated by variety of sources This phenomenon of having both problems together can be referred to the “curse of big dimensionality,” that affect existing techniques in terms of both performance and accuracy To address this gap and to understand the core problem, it is necessary to identify the unique challenges brought by the anomaly detection with both high dimensionality and big data problems Hence, this survey aims to document the state of anomaly detection in high dimensional big data by representing the unique challenges using a triangular model of vertices: the problem (big dimensionality), techniques/algorithms (anomaly detection), and tools (big data applications/frameworks) Authors’ work that fall directly into any of the vertices or closely related to them are taken into consideration for review Furthermore, the limitations of traditional approaches and current strategies of high dimensional data are discussed along with recent techniques and applications on big data required for the optimization of anomaly detection

159 citations

Proceedings ArticleDOI
19 Aug 2017
TL;DR: Most existing NRL methods are summarized into a unified two-step framework, including proximity matrix construction and dimension reduction, and Network Embedding Update (NEU) algorithm is proposed which implicitly approximates higher order proximities with theoretical approximation bound.
Abstract: Many Network Representation Learning (NRL) methods have been proposed to learn vector representations for vertices in a network recently. In this paper, we summarize most existing NRL methods into a unified two-step framework, including proximity matrix construction and dimension reduction. We focus on the analysis of proximity matrix construction step and conclude that an NRL method can be improved by exploring higher order proximities when building the proximity matrix. We propose Network Embedding Update (NEU) algorithm which implicitly approximates higher order proximities with theoretical approximation bound and can be applied on any NRL methods to enhance their performances. We conduct experiments on multi-label classification and link prediction tasks. Experimental results show that NEU can make a consistent and significant improvement over a number of NRL methods with almost negligible running time on all three publicly available datasets. The source code of this paper can be obtained from https://github.com/thunlp/NEU.

152 citations


Cites background from "Graph based anomaly detection and d..."

  • ..., 2014], anomaly detection [Akoglu et al., 2015] and link prediction [Liben-Nowell and Kleinberg, 2007]....

    [...]

  • ...Researchers have made great effort on developing machine learning algorithms for various network applications, e.g. vertex classification [Sen et al., 2008], tag recommendation [Tu et al., 2014], anomaly detection [Akoglu et al., 2015] and link prediction [Liben-Nowell and Kleinberg, 2007]....

    [...]

Journal ArticleDOI
John Boaz Lee, Ryan A. Rossi1, Sungchul Kim1, Nesreen K. Ahmed2, Eunyee Koh1 
TL;DR: This work conducts a comprehensive and focused survey of the literature on the emerging field of graph attention models and introduces three intuitive taxonomies to group existing work.
Abstract: Graph-structured data arise naturally in many different application domains. By representing data as graphs, we can capture entities (i.e., nodes) as well as their relationships (i.e., edges) with each other. Many useful insights can be derived from graph-structured data as demonstrated by an ever-growing body of work focused on graph mining. However, in the real-world, graphs can be both large—with many complex patterns—and noisy, which can pose a problem for effective graph mining. An effective way to deal with this issue is to incorporate “attention” into graph mining solutions. An attention mechanism allows a method to focus on task-relevant parts of the graph, helping it to make better decisions. In this work, we conduct a comprehensive and focused survey of the literature on the emerging field of graph attention models. We introduce three intuitive taxonomies to group existing work. These are based on problem setting (type of input and output), the type of attention mechanism used, and the task (e.g., graph classification, link prediction). We motivate our taxonomies through detailed examples and use each to survey competing approaches from a unique standpoint. Finally, we highlight several challenges in the area and discuss promising directions for future work.

139 citations


Cites background from "Graph based anomaly detection and d..."

  • ...Data that can be naturally modeled as graphs are found in a wide variety of domains including the world-wide web [Albert et al. 1999], bioinformatics [Pei et al. 2005], neuroscience [Lee et al. 2017], chemoinformatics [Duvenaud et al. 2015], social networks [Backstrom and Leskovec 2011], scientific…...

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors propose a method that aims at detecting evolutionary changes in the configuration of a complex system, and generates intervals accordingly, by looking for peaks in the similarity between sets of events on consecutive time intervals of data.
Abstract: Most complex systems are intrinsically dynamic in nature. The evolution of a dynamic complex system is typically represented as a sequence of snapshots, where each snapshot describes the configuration of the system at a particular instant of time. This is often done by using constant intervals but a better approach would be to define dynamic intervals that match the evolution of the system’s configuration. To this end, we propose a method that aims at detecting evolutionary changes in the configuration of a complex system, and generates intervals accordingly. We show that evolutionary timescales can be identified by looking for peaks in the similarity between the sets of events on consecutive time intervals of data. Tests on simple toy models reveal that the technique is able to detect evolutionary timescales of time-varying data both when the evolution is smooth as well as when it changes sharply. This is further corroborated by analyses of several real datasets. Our method is scalable to extremely large datasets and is computationally efficient. This allows a quick, parameter-free detection of multiple timescales in the evolution of a complex system.

138 citations

Journal ArticleDOI
Wei Wang1, Yaoyao Shang1, Yongzhong He1, Yidong Li1, Jiqiang Liu1 
TL;DR: This work proposes BotMark, an automated model that detects botnets with hybrid analysis of flow-based and graph-based network traffic behaviors with superior detection accuracy, and collects a very large size of network traffic by simulating 5 newly propagated botnets in a real computing environment.

134 citations

References
More filters
Journal ArticleDOI
04 Jun 1998-Nature
TL;DR: Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs.
Abstract: Networks of coupled dynamical systems have been used to model biological oscillators, Josephson junction arrays, excitable media, neural networks, spatial games, genetic control networks and many other self-organizing systems. Ordinarily, the connection topology is assumed to be either completely regular or completely random. But many biological, technological and social networks lie somewhere between these two extremes. Here we explore simple models of networks that can be tuned through this middle ground: regular networks 'rewired' to introduce increasing amounts of disorder. We find that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs. We call them 'small-world' networks, by analogy with the small-world phenomenon (popularly known as six degrees of separation. The neural network of the worm Caenorhabditis elegans, the power grid of the western United States, and the collaboration graph of film actors are shown to be small-world networks. Models of dynamical systems with small-world coupling display enhanced signal-propagation speed, computational power, and synchronizability. In particular, infectious diseases spread more easily in small-world networks than in regular lattices.

39,297 citations

Book
01 Jan 1983

34,729 citations

Journal ArticleDOI
15 Oct 1999-Science
TL;DR: A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.
Abstract: Systems as diverse as genetic networks or the World Wide Web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature was found to be a consequence of two generic mechanisms: (i) networks expand continuously by the addition of new vertices, and (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.

33,771 citations

Book
01 Jan 1970
TL;DR: In this article, a complete revision of a classic, seminal, and authoritative book that has been the model for most books on the topic written since 1970 is presented, focusing on practical techniques throughout, rather than a rigorous mathematical treatment of the subject.
Abstract: From the Publisher: This is a complete revision of a classic, seminal, and authoritative book that has been the model for most books on the topic written since 1970. It focuses on practical techniques throughout, rather than a rigorous mathematical treatment of the subject. It explores the building of stochastic (statistical) models for time series and their use in important areas of application —forecasting, model specification, estimation, and checking, transfer function modeling of dynamic relationships, modeling the effects of intervention events, and process control. Features sections on: recently developed methods for model specification, such as canonical correlation analysis and the use of model selection criteria; results on testing for unit root nonstationarity in ARIMA processes; the state space representation of ARMA models and its use for likelihood estimation and forecasting; score test for model checking; and deterministic components and structural components in time series models and their estimation based on regression-time series model methods.

19,748 citations