Support Vector Data Description

doi:10.1023/B:MACH.0000008084.60811.49

Home
/
Papers
/
Support Vector Data Description

Journal Article•DOI•

Support Vector Data Description

David M. J. Tax¹, Robert P. W. Duin¹•Institutions (1)

Delft University of Technology¹

01 Jan 2004-Machine Learning (Kluwer Academic Publishers)-Vol. 54, Iss: 1, pp 45-66

TL;DR: The Support Vector Data Description (SVDD) is presented which obtains a spherically shaped boundary around a dataset and analogous to the Support Vector Classifier it can be made flexible by using other kernel functions.

read less

Abstract: Data domain description concerns the characterization of a data set. A good description covers all target data but includes no superfluous space. The boundary of a dataset can be used to detect novel data or outliers. We will present the Support Vector Data Description (SVDD) which is inspired by the Support Vector Classifier. It obtains a spherically shaped boundary around a dataset and analogous to the Support Vector Classifier it can be made flexible by using other kernel functions. The method is made robust against outliers in the training set and is capable of tightening the description by using negative examples. We show characteristics of the Support Vector Data Descriptions using artificial and real data.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Anomaly detection: A survey

[...]

Varun Chandola¹, Arindam Banerjee¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

30 Jul 2009-ACM Computing Surveys

TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.

...read moreread less

Abstract: Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and more succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.

...read moreread less

9,627 citations

Journal Article•DOI•

Support Vector Machines for classification and regression

[...]

Richard G. Brereton¹, Gavin R. Lloyd¹•Institutions (1)

University of Bristol¹

25 Jan 2010-Analyst

TL;DR: The increasing interest in Support Vector Machines (SVMs) over the past 15 years is described, including its application to multivariate calibration, and why it is useful when there are outliers and non-linearities.

...read moreread less

Abstract: The increasing interest in Support Vector Machines (SVMs) over the past 15 years is described. Methods are illustrated using simulated case studies, and 4 experimental case studies, namely mass spectrometry for studying pollution, near infrared analysis of food, thermal analysis of polymers and UV/visible spectroscopy of polyaromatic hydrocarbons. The basis of SVMs as two-class classifiers is shown with extensive visualisation, including learning machines, kernels and penalty functions. The influence of the penalty error and radial basis function radius on the model is illustrated. Multiclass implementations including one vs. all, one vs. one, fuzzy rules and Directed Acyclic Graph (DAG) trees are described. One-class Support Vector Domain Description (SVDD) is described and contrasted to conventional two- or multi-class classifiers. The use of Support Vector Regression (SVR) is illustrated including its application to multivariate calibration, and why it is useful when there are outliers and non-linearities.

...read moreread less

1,899 citations

Journal Article•DOI•

Isolation-Based Anomaly Detection

[...]

Fei Tony Liu¹, Kai Ming Ting¹, Zhi-Hua Zhou²•Institutions (2)

Monash University¹, Nanjing University²

01 Mar 2012-ACM Transactions on Knowledge Discovery From Data

TL;DR: This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods.

...read moreread less

Abstract: Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation. This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods.As a result, iForest is able to exploit subsampling (i) to achieve a low linear time-complexity and a small memory-requirement and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that iForest outperforms ORCA, one-class SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects. iForest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample.

...read moreread less

1,266 citations

Proceedings Article•DOI•

Learning classifiers from only positive and unlabeled data

[...]

Charles Elkan¹, Keith Noto¹•Institutions (1)

University of California, San Diego¹

24 Aug 2008

TL;DR: This paper shows that models trained using the new methods perform better than the current state-of-the-art biased SVM method for learning from positive and unlabeled examples, and applies them to solve a real-world problem: identifying protein records that should be included in an incomplete specialized molecular biology database.

...read moreread less

Abstract: The input to an algorithm that learns a binary classifier normally consists of two sets of examples, where one set consists of positive examples of the concept to be learned, and the other set consists of negative examples. However, it is often the case that the available training data are an incomplete set of positive examples, and a set of unlabeled examples, some of which are positive and some of which are negative. The problem solved in this paper is how to learn a standard binary classifier given a nontraditional training set of this nature.Under the assumption that the labeled examples are selected randomly from the positive examples, we show that a classifier trained on positive and unlabeled examples predicts probabilities that differ by only a constant factor from the true conditional probabilities of being positive. We show how to use this result in two different ways to learn a classifier from a nontraditional training set. We then apply these two new methods to solve a real-world problem: identifying protein records that should be included in an incomplete specialized molecular biology database. Our experiments in this domain show that models trained using the new methods perform better than the current state-of-the-art biased SVM method for learning from positive and unlabeled examples.

...read moreread less

1,007 citations

Journal Article•DOI•

High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning

[...]

Sarah M. Erfani¹, Sutharshan Rajasegarar¹, Shanika Karunasekera¹, Christopher Leckie¹•Institutions (1)

University of Melbourne¹

01 Oct 2016-Pattern Recognition

TL;DR: A hybrid model where an unsupervised DBN is trained to extract generic underlying features, and a one-class SVM is trained from the features learned by the DBN, which delivers a comparable accuracy with a deep autoencoder and is scalable and computationally efficient.

...read moreread less

876 citations

Cites methods from "Support Vector Data Description"

...For the explanation below, two of the most common 1SVM algorithms are chosen, a hypersphere-based 1SVM (known as Support Vector Data Description (SVDD)) by Tax and Duin [8], and a Planebased 1SVM (PSVM) by Scholkopf et al. [31], see Fig....
[...]
...The parameters of SVM based methods are selected via a grid-search, width ν ð0 1Þ, and σ ð1 1Þ for SVDD [8], and γ ð2 (15);2 (13);....
[...]
...For the explanation below, two of the most common 1SVM algorithms are chosen, a hypersphere-based 1SVM (known as Support Vector Data Description (SVDD)) by Tax and Duin [8], and a Planebased 1SVM (PSVM) by Scholkopf et al....
[...]
...This is extended from the hypersphere-based one-class SVM approach proposed by Tax and Duin [8]....
[...]
...Further, Tax and Duin have shown that the hyperplane-based one-class SVM becomes a special case of the (equivalent) hypersphere-based scheme when used with a radial basis kernel....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

The Nature of Statistical Learning Theory

[...]

Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

01 Jan 1995

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

...read moreread less

40,147 citations

"Support Vector Data Description" refers methods in this paper

...This is identical to the approach which is used in Schölkopf, Burges, and Vapnik (1995) to estimate the VC-dimension of a classifier (which is bounded by the diameter of the smallest sphere enclosing the data)....
[...]

Statistical learning theory

[...]

Vladimir Vapnik

01 Jan 1998

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

26,531 citations

"Support Vector Data Description" refers background or methods in this paper

...Several kernel functions have been proposed for the Support Vector Classifier (Vapnik, 1998; Smola, Schölkopf, & Müller, 1998)....
[...]
...In contrast to the Support Vector Classifier, the Support Vector Data Description using a polynomial kernel suffers from the large influence of the norms of the object vectors, but it shows promising results for the Gaussian kernel....
[...]
...For that the notion of essential support vectors has to be introduced (Vapnik, 1998)....
[...]
...Vapnik argued that in order to solve a problem, one should not try to solve a more general problem as an intermediate step ( Vapnik, 1998 )....
[...]
...The classifiers are Gaussian-density based linear classifier (called Bayes), Parzen classifier and the Support Vector Classifier with polynomial kernel, degree 3....
[...]

Book•

Neural networks for pattern recognition

[...]

Christopher M. Bishop¹•Institutions (1)

Aston University¹

01 Jan 1995

TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

19,056 citations

Book Chapter•DOI•

Neural Networks for Pattern Recognition

[...]

Suresh Kothari¹, Heekuck Oh¹•Institutions (1)

Iowa State University¹

01 Jan 1993-Advances in Computers

TL;DR: The chapter discusses two important directions of research to improve learning algorithms: the dynamic node generation, which is used by the cascade correlation algorithm; and designing learning algorithms where the choice of parameters is not an issue.

...read moreread less

Abstract: Publisher Summary This chapter provides an account of different neural network architectures for pattern recognition. A neural network consists of several simple processing elements called neurons. Each neuron is connected to some other neurons and possibly to the input nodes. Neural networks provide a simple computing paradigm to perform complex recognition tasks in real time. The chapter categorizes neural networks into three types: single-layer networks, multilayer feedforward networks, and feedback networks. It discusses the gradient descent and the relaxation method as the two underlying mathematical themes for deriving learning algorithms. A lot of research activity is centered on learning algorithms because of their fundamental importance in neural networks. The chapter discusses two important directions of research to improve learning algorithms: the dynamic node generation, which is used by the cascade correlation algorithm; and designing learning algorithms where the choice of parameters is not an issue. It closes with the discussion of performance and implementation issues.

...read moreread less

13,033 citations

"Support Vector Data Description" refers background or methods in this paper

...Neural networks, for instance, can be trained to estimate posterior probabilities (Richard & Lippmann, 1991; Bishop, 1995; Ripley, 1996) and tend to give high confidence outputs for objects which are remote from the training set....
[...]
...The third method is a Mixture of Gaussians, optimized using EM (Bishop, 1995)....
[...]
...By applying Leave-One-Out estimation (Vapnik, 1998; Bishop, 1995), it can be shown that the number of support vectors is an indication of the expected error made on the target set....
[...]
...In classification or regression problems a more advanced Bayesian approach can be used for detecting outliers (Bishop, 1995; MacKay, 1992; Roberts & Penny, 1996)....
[...]
...Keywords: outlier detection, novelty detection, one-class classification, support vector classifier, support vector data description...
[...]