scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Graph based anomaly detection and description: a survey

TL;DR: This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs, and gives a general framework for the algorithms categorized under various settings.
Abstract: Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised versus (semi-)supervised approaches, for static versus dynamic graphs, for attributed versus plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the `why', of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field.
Citations
More filters
Journal ArticleDOI
TL;DR: A survey of such knowledge graph refinement approaches, with a dual look at both the methods being proposed as well as the evaluation methodologies used.
Abstract: In the recent years, different Web knowledge graphs, both free and commercial, have been created. While Google coined the term "Knowledge Graph" in 2012, there are also a few openly available knowledge graphs, with DBpedia, YAGO, and Freebase being among the most prominent ones. Those graphs are often constructed from semi-structured knowledge, such as Wikipedia, or harvested from the web with a combination of statistical and linguistic methods. The result are large-scale knowledge graphs that try to make a good trade-off between completeness and correctness. In order to further increase the utility of such knowledge graphs, various refinement methods have been proposed, which try to infer and add missing knowledge to the graph, or identify erroneous pieces of information. In this article, we provide a survey of such knowledge graph refinement approaches, with a dual look at both the methods being proposed as well as the evaluation methodologies used.

915 citations


Cites methods from "Graph based anomaly detection and d..."

  • ...In particular, for many of the methods applied in the works discussed above – such as outlier detection or association rule mining – graph-based variants have been proposed in the literature [2,43]....

    [...]

Journal ArticleDOI
TL;DR: Fast AnoGAN (f‐AnoGAN), a generative adversarial network (GAN) based unsupervised learning approach capable of identifying anomalous images and image segments, that can serve as imaging biomarker candidates is presented.

777 citations

Journal ArticleDOI
19 Apr 2016-PLOS ONE
TL;DR: This paper aims to be a new well-funded basis for unsupervised anomaly detection research by publishing the source code and the datasets, and reveals the strengths and weaknesses of the different approaches for the first time.
Abstract: Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks.

737 citations


Cites methods from "Graph based anomaly detection and d..."

  • ...This also holds true in anomaly detection and there exist many algorithms for detecting anomalies in graphs [30], in sequences and time series [31] and for addressing spatial data [32]....

    [...]

Journal ArticleDOI
TL;DR: A comprehensive review specifically on the emerging field of graph convolutional networks, which is one of the most prominent graph deep learning models, is conducted and several open challenges are presented and potential directions for future research are discussed.
Abstract: Graphs naturally appear in numerous application domains, ranging from social analysis, bioinformatics to computer vision. The unique capability of graphs enables capturing the structural relations among data, and thus allows to harvest more insights compared to analyzing data in isolation. However, it is often very challenging to solve the learning problems on graphs, because (1) many types of data are not originally structured as graphs, such as images and text data, and (2) for graph-structured data, the underlying connectivity patterns are often complex and diverse. On the other hand, the representation learning has achieved great successes in many areas. Thereby, a potential solution is to learn the representation of graphs in a low-dimensional Euclidean space, such that the graph properties can be preserved. Although tremendous efforts have been made to address the graph representation learning problem, many of them still suffer from their shallow learning mechanisms. Deep learning models on graphs (e.g., graph neural networks) have recently emerged in machine learning and other related areas, and demonstrated the superior performance in various problems. In this survey, despite numerous types of graph neural networks, we conduct a comprehensive review specifically on the emerging field of graph convolutional networks, which is one of the most prominent graph deep learning models. First, we group the existing graph convolutional network models into two categories based on the types of convolutions and highlight some graph convolutional network models in details. Then, we categorize different graph convolutional networks according to the areas of their applications. Finally, we present several open challenges in this area and discuss potential directions for future research.

562 citations


Cites background from "Graph based anomaly detection and d..."

  • ...Graphs naturally arise in many real-world applications, including social analysis [1], fraud detection [2, 3], traffic prediction [4], computer vision [5], and many more....

    [...]

Journal ArticleDOI
TL;DR: A comprehensive survey of deep anomaly detection with a comprehensive taxonomy is presented in this paper, covering advancements in 3 high-level categories and 11 fine-grained categories of the methods.
Abstract: Anomaly detection, a.k.a. outlier detection or novelty detection, has been a lasting yet active research area in various research communities for several decades. There are still some unique problem complexities and challenges that require advanced approaches. In recent years, deep learning enabled anomaly detection, i.e., deep anomaly detection, has emerged as a critical direction. This article surveys the research of deep anomaly detection with a comprehensive taxonomy, covering advancements in 3 high-level categories and 11 fine-grained categories of the methods. We review their key intuitions, objective functions, underlying assumptions, advantages, and disadvantages and discuss how they address the aforementioned challenges. We further discuss a set of possible future opportunities and new perspectives on addressing the challenges.

560 citations

References
More filters
Journal ArticleDOI
16 May 2000
TL;DR: This paper contends that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier, called the local outlier factor (LOF), and gives a detailed formal analysis showing that LOF enjoys many desirable properties.
Abstract: For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier. This degree is called the local outlier factor (LOF) of an object. It is local in that the degree depends on how isolated the object is with respect to the surrounding neighborhood. We give a detailed formal analysis showing that LOF enjoys many desirable properties. Using real-world datasets, we demonstrate that LOF can be used to find outliers which appear to be meaningful, but can otherwise not be identified with existing approaches. Finally, a careful performance evaluation of our algorithm confirms we show that our approach of finding local outliers can be practical.

5,248 citations

Journal ArticleDOI

3,888 citations


"Graph based anomaly detection and d..." refers methods in this paper

  • ...The research directions in this category include: algebraic connectivity [Fiedler, 1973] [Wilson and Zhu, 2008]), a spectral method that has been studied thoroughly; an SVM-based approach on global feature vectors [Li et al., 2011a]; social networks similarity [Macindoe and Richards, 2010] which is…...

    [...]

  • ...The research directions in this category include: algebraic connectivity (Fiedler 1973; Wilson and Zhu 2008), a spectral method that has been studied thoroughly; an SVM-based approach on global feature vectors (Li et al....

    [...]

Journal ArticleDOI
TL;DR: A new method of computation which takes into account who chooses as well as how many choose is presented, which introduces the concept of attenuation in influence transmitted through intermediaries.
Abstract: For the purpose of evaluating status in a manner free from the deficiencies of popularity contest procedures, this paper presents a new method of computation which takes into accountwho chooses as well ashow many choose. It is necessary to introduce, in this connection, the concept of attenuation in influence transmitted through intermediaries.

3,386 citations


"Graph based anomaly detection and d..." refers background in this paper

  • ...The slightly more complex Katz measure Katz [1953] counts all the paths weighted inversely proportional to the path length....

    [...]

Journal ArticleDOI
TL;DR: In this article, a spatial scan statistic for the detection of clusters in a multi-dimensional point process is proposed, where the area of the scanning window is allowed to vary, and the baseline process may be any inhomogeneous Poisson process or Bernoulli process with intensity pro-portional to some known function.
Abstract: The scan statistic is commonly used to test if a one dimensional point process is purely random, or if any clusters can be detected. Here it is simultaneously extended in three directions:(i) a spatial scan statistic for the detection of clusters in a multi-dimensional point process is proposed, (ii) the area of the scanning window is allowed to vary, and (iii) the baseline process may be any inhomogeneous Poisson process or Bernoulli process with intensity pro-portional to some known function. The main interest is in detecting clusters not explained by the baseline process. These methods are illustrated on an epidemiological data set, but there are other potential areas of application as well.

3,314 citations