Home
/
Authors
/
Guilherme Oliveira Campos

Author

Guilherme Oliveira Campos

Other affiliations: University of São Paulo, University of Southern Denmark

Bio: Guilherme Oliveira Campos is an academic researcher from Universidade Federal de Minas Gerais. The author has contributed to research in topics: Anomaly detection & Bipartite graph. The author has an hindex of 4, co-authored 7 publications receiving 427 citations. Previous affiliations of Guilherme Oliveira Campos include University of São Paulo & University of Southern Denmark.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

[...]

Guilherme Oliveira Campos¹, Arthur Zimek², Jörg Sander³, Ricardo J. G. B. Campello¹, Barbora Micenková⁴, Erich Schubert², Ira Assent⁴, Michael E. Houle⁵ - Show less +4 more•Institutions (5)

University of São Paulo¹, Ludwig Maximilian University of Munich², University of Alberta³, Aarhus University⁴, National Institute of Informatics⁵

01 Jul 2016-Data Mining and Knowledge Discovery

TL;DR: An extensive experimental study on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose, and provides a characterization of the datasets themselves.

...read moreread less

Abstract: The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate benchmark datasets with ground truth annotation is a significant impediment to the evaluation of outlier methods. Even when labeled datasets are available, their suitability for the outlier detection task is typically unknown. Furthermore, the biases of commonly-used evaluation measures are not fully understood. It is thus difficult to ascertain the extent to which newly-proposed outlier detection methods improve over established methods. In this paper, we perform an extensive experimental study on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose. Based on the overall performance of the outlier detection methods, we provide a characterization of the datasets themselves, and discuss their suitability as outlier detection benchmark sets. We also examine the most commonly-used measures for comparing the performance of different methods, and suggest adaptations that are more suitable for the evaluation of outlier detection results.

...read moreread less

552 citations

Book Chapter•DOI•

An Unsupervised Boosting Strategy for Outlier Detection Ensembles

[...]

Guilherme Oliveira Campos¹, Guilherme Oliveira Campos², Arthur Zimek², Wagner Meira¹•Institutions (2)

Universidade Federal de Minas Gerais¹, University of Southern Denmark²

03 Jun 2018

TL;DR: This work proposes a boosting strategy for combinations showing improvements on benchmark datasets and designs smaller ensembles out of a wealth of possible ensemble members to improve the diversity and accuracy of the ensemble.

...read moreread less

Abstract: Ensemble techniques have been applied to the unsupervised outlier detection problem in some scenarios. Challenges are the generation of diverse ensemble members and the combination of individual results into an ensemble. For the latter challenge, some methods tried to design smaller ensembles out of a wealth of possible ensemble members, to improve the diversity and accuracy of the ensemble (relating to the ensemble selection problem in classification). We propose a boosting strategy for combinations showing improvements on benchmark datasets.

...read moreread less

23 citations

On the Evaluation of Outlier Detection: Measures, Datasets, and an Empirical Study Continued

[...]

Guilherme Oliveira Campos¹, Arthur Zimek, Jörg Sander², Ricardo J. G. B. Campello¹, Barbora Micenková³, Erich Schubert⁴, Ira Assent³, Michael E. Houle⁵ - Show less +4 more•Institutions (5)

University of São Paulo¹, University of Alberta², Aarhus University³, Ludwig Maximilian University of Munich⁴, National Institute of Informatics⁵

01 Jan 2016

...read moreread less

Abstract: The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of di erent standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate benchmark datasets with ground truth annotation is a signi cant impediment to the evaluation of outlier methods. Even when labeled datasets are available, their suitability for the outlier detection task is typically unknown. Furthermore, the biases of commonly-used evaluation measures are not fully understood. It is thus di cult to ascertain the extent to which newly-proposed outlier detection methods improve over established methods. We performed an extensive experimental study [1] on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose. Based on the overall performance of the outlier detection methods, we provide a characterization of the datasets themselves, and discuss their suitability as outlier detection benchmark sets. We also examine the most commonly-used measures for comparing the performance of di erent methods, and suggest adaptations that are more suitable for the evaluation of outlier detection results. We present the results from our previous publication [1] as well as additional observations and measures added to the online repository.

...read moreread less

7 citations

Proceedings Article•DOI•

Outlier Detection in Graphs: On the Impact of Multiple Graph Models

[...]

Guilherme Oliveira Campos¹, Wagner Meira¹, Arthur Zimek²•Institutions (2)

Universidade Federal de Minas Gerais¹, University of Southern Denmark²

25 Jun 2018

TL;DR: It is shown that assessing the similarity between graphs may be a guidance to determine effective combinations, as less similar graphs are complementary with respect to outlier information they provide and lead to better outlier detection.

...read moreread less

Abstract: Various previous works proposed techniques to detect outliers in graph data. Usually, some complex dataset is modeled as a graph and a technique for detecting outliers in graphs is applied. The impact of the graph model on the outlier detection capabilities of any method has been ignored. Here we assess the impact of the graph model on the outlier detection performance and the gains that may be achieved by using multiple graph models and combining the results obtained by these models. We show that assessing the similarity between graphs may be a guidance to determine effective combinations, as less similar graphs are complementary with respect to outlier information they provide and lead to better outlier detection.

...read moreread less

6 citations

Proceedings Article•

Unsupervised Ensembles for Outlier Detection.

[...]

Guilherme Oliveira Campos¹•Institutions (1)

Universidade Federal de Minas Gerais¹

01 Jan 2018

TL;DR: This paper proposes a boosting strategy to solve the ensemble selection problem, called BoostSelect, and evaluates it over a large benchmark of datasets for outlier detection, showing improvements over baseline approaches.

...read moreread less

Abstract: Ensemble techniques have been applied to the unsupervised outlier detection problem in some scenarios. Challenges are the generation of diverse ensemble members and the combination of individual results into an ensemble. For the latter challenge, some methods tried to design smaller ensembles out of a wealth of possible ensemble members, to improve the diversity and accuracy of the ensemble (relating to the ensemble selection problem in classification). In this paper, We propose a boosting strategy to solve the ensemble selection problem, called BoostSelect. We evaluate BoostSelect over a large benchmark of datasets for outlier detection, showing improvements over baseline approaches.

...read moreread less

5 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Random graphs

[...]

Alan Frieze¹•Institutions (1)

Carnegie Mellon University¹

22 Jan 2006

TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.

...read moreread less

Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

...read moreread less

7,116 citations

Journal Article•DOI•

Deep Learning for Anomaly Detection: A Review

[...]

Guansong Pang¹, Chunhua Shen¹, Longbing Cao², Anton van den Hengel¹•Institutions (2)

University of Adelaide¹, University of Technology, Sydney²

05 Mar 2021-ACM Computing Surveys

TL;DR: A comprehensive survey of deep anomaly detection with a comprehensive taxonomy is presented in this paper, covering advancements in 3 high-level categories and 11 fine-grained categories of the methods.

...read moreread less

Abstract: Anomaly detection, a.k.a. outlier detection or novelty detection, has been a lasting yet active research area in various research communities for several decades. There are still some unique problem complexities and challenges that require advanced approaches. In recent years, deep learning enabled anomaly detection, i.e., deep anomaly detection, has emerged as a critical direction. This article surveys the research of deep anomaly detection with a comprehensive taxonomy, covering advancements in 3 high-level categories and 11 fine-grained categories of the methods. We review their key intuitions, objective functions, underlying assumptions, advantages, and disadvantages and discuss how they address the aforementioned challenges. We further discuss a set of possible future opportunities and new perspectives on addressing the challenges.

...read moreread less

560 citations

An introduction to mathematical statistical and its applications / Richard J. Larsen, Morris L. Marx

[...]

Richard J. Larsen, Morris L. Marx

01 Jan 1981

TL;DR: In this article, Monte Carlo techniques are used to estimate the probability of a given set of variables for a particular set of classes of data, such as conditional probability and hypergeometric probability.

...read moreread less

Abstract: 1. Introduction 1.1 An Overview 1.2 Some Examples 1.3 A Brief History 1.4 A Chapter Summary 2. Probability 2.1 Introduction 2.2 Sample Spaces and the Algebra of Sets 2.3 The Probability Function 2.4 Conditional Probability 2.5 Independence 2.6 Combinatorics 2.7 Combinatorial Probability 2.8 Taking a Second Look at Statistics (Monte Carlo Techniques) 3. Random Variables 3.1 Introduction 3.2 Binomial and Hypergeometric Probabilities 3.3 Discrete Random Variables 3.4 Continuous Random Variables 3.5 Expected Values 3.6 The Variance 3.7 Joint Densities 3.8 Transforming and Combining Random Variables 3.9 Further Properties of the Mean and Variance 3.10 Order Statistics 3.11 Conditional Densities 3.12 Moment-Generating Functions 3.13 Taking a Second Look at Statistics (Interpreting Means) Appendix 3.A.1 MINITAB Applications 4. Special Distributions 4.1 Introduction 4.2 The Poisson Distribution 4.3 The Normal Distribution 4.4 The Geometric Distribution 4.5 The Negative Binomial Distribution 4.6 The Gamma Distribution 4.7 Taking a Second Look at Statistics (Monte Carlo Simulations) Appendix 4.A.1 MINITAB Applications Appendix 4.A.2 A Proof of the Central Limit Theorem 5. Estimation 5.1 Introduction 5.2 Estimating Parameters: The Method of Maximum Likelihood and the Method of Moments 5.3 Interval Estimation 5.4 Properties of Estimators 5.5 Minimum-Variance Estimators: The Crami?½r-Rao Lower Bound 5.6 Sufficient Estimators 5.7 Consistency 5.8 Bayesian Estimation 5.9 Taking A Second Look at Statistics (Beyond Classical Estimation) Appendix 5.A.1 MINITAB Applications 6. Hypothesis Testing 6.1 Introduction 6.2 The Decision Rule 6.3 Testing Binomial Dataâ H0: p = po 6.4 Type I and Type II Errors 6.5 A Notion of Optimality: The Generalized Likelihood Ratio 6.6 Taking a Second Look at Statistics (Statistical Significance versus â Practicalâ Significance) 7. Inferences Based on the Normal Distribution 7.1 Introduction 7.2 Comparing Y-i?½ s/ vn and Y-i?½ S/ vn 7.3 Deriving the Distribution of Y-i?½ S/ vn 7.4 Drawing Inferences About i?½ 7.5 Drawing Inferences About s2 7.6 Taking a Second Look at Statistics (Type II Error) Appendix 7.A.1 MINITAB Applications Appendix 7.A.2 Some Distribution Results for Y and S2 Appendix 7.A.3 A Proof that the One-Sample t Test is a GLRT Appendix 7.A.4 A Proof of Theorem 7.5.2 8. Types of Data: A Brief Overview 8.1 Introduction 8.2 Classifying Data 8.3 Taking a Second Look at Statistics (Samples Are Not â Validâ !) 9. Two-Sample Inferences 9.1 Introduction 9.2 Testing H0: i?½X =i?½Y 9.3 Testing H0: s2X=s2Yâ The F Test 9.4 Binomial Data: Testing H0: pX = pY 9.5 Confidence Intervals for the Two-Sample Problem 9.6 Taking a Second Look at Statistics (Choosing Samples) Appendix 9.A.1 A Derivation of the Two-Sample t Test (A Proof of Theorem 9.2.2) Appendix 9.A.2 MINITAB Applications 10. Goodness-of-Fit Tests 10.1 Introduction 10.2 The Multinomial Distribution 10.3 Goodness-of-Fit Tests: All Parameters Known 10.4 Goodness-of-Fit Tests: Parameters Unknown 10.5 Contingency Tables 10.6 Taking a Second Look at Statistics (Outliers) Appendix 10.A.1 MINITAB Applications 11. Regression 11.1 Introduction 11.2 The Method of Least Squares 11.3 The Linear Model 11.4 Covariance and Correlation 11.5 The Bivariate Normal Distribution 11.6 Taking a Second Look at Statistics (How Not to Interpret the Sample Correlation Coefficient) Appendix 11.A.1 MINITAB Applications Appendix 11.A.2 A Proof of Theorem 11.3.3 12. The Analysis of Variance 12.1 Introduction 12.2 The F Test 12.3 Multiple Comparisons: Tukeyâ s Method 12.4 Testing Subhypotheses with Contrasts 12.5 Data Transformations 12.6 Taking a Second Look at Statistics (Putting the Subject of Statistics togetherâ the Contributions of Ronald A. Fisher) Appendix 12.A.1 MINITAB Applications Appendix 12.A.2 A Proof of Theorem 12.2.2 Appendix 12.A.3 The Distribution of SSTR/(kâ 1) SSE/(nâ k)When H1 is True 13. Randomized Block Designs 13.1 Introduction 13.2 The F Test for a Randomized Block Design 13.3 The Paired t Test 13.4 Taking a Second Look at Statistics (Choosing between a Two-Sample t Test and a Paired t Test) Appendix 13.A.1 MINITAB Applications 14. Nonparametric Statistics 14.1 Introduction 14.2 The Sign Test 14.3 Wilcoxon Tests 14.4 The Kruskal-Wallis Test 14.5 The Friedman Test 14.6 Testing for Randomness 14.7 Taking a Second Look at Statistics (Comparing Parametric and Nonparametric Procedures) Appendix 14.A.1 MINITAB Applications Appendix: Statistical Tables Answers to Selected Odd-Numbered Questions Bibliography Index

...read moreread less

524 citations

Journal Article•DOI•

Deep Learning for Anomaly Detection: A Review

[...]

Guansong Pang¹, Chunhua Shen¹, Longbing Cao², Anton van den Hengel¹•Institutions (2)

University of Adelaide¹, University of Technology, Sydney²

06 Jul 2020-arXiv: Learning

TL;DR: This article surveys the research of deep anomaly detection with a comprehensive taxonomy, covering advancements in 3 high-level categories and 11 fine-grained categories of the methods and discusses how they address the aforementioned challenges.

...read moreread less

Abstract: Anomaly detection, a.k.a. outlier detection, has been a lasting yet active research area in various research communities for several decades. There are still some unique problem complexities and challenges that require advanced approaches. In recent years, deep learning enabled anomaly detection, i.e., deep anomaly detection, has emerged as a critical direction. This paper reviews the research of deep anomaly detection with a comprehensive taxonomy of detection methods, covering advancements in three high-level categories and 11 fine-grained categories of the methods. We review their key intuitions, objective functions, underlying assumptions, advantages and disadvantages, and discuss how they address the aforementioned challenges. We further discuss a set of possible future opportunities and new perspectives on addressing the challenges.

...read moreread less

385 citations

Journal Article•

Ranking outliers using symmetric neighborhood relationship

[...]

Wen Jin, Anthony K. H. Tung, Jiawei Han, Wei Wang

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: In this article, the authors proposed a measure on local outliers based on a symmetric neighborhood relationship, which considers both neighbors and reverse neighbors of an object when estimating its density distribution.

...read moreread less

Abstract: Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.

...read moreread less

321 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116

Collapse