Support-Vector Networks

doi:10.1023/A:1022627411411

Home
/
Papers
/
Support-Vector Networks

Journal Article•DOI•

Support-Vector Networks

Corinna Cortes¹, Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

15 Sep 1995-Machine Learning (Kluwer Academic Publishers)-Vol. 20, Iss: 3, pp 273-297

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

read less

Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book•

Kernel Methods for Pattern Analysis

[...]

John Shawe-Taylor¹, Nello Cristianini²•Institutions (2)

University of Southampton¹, University of Bristol²

01 Jan 2004

TL;DR: This book provides an easy introduction for students and researchers to the growing field of kernel-based pattern analysis, demonstrating with examples how to handcraft an algorithm or a kernel for a new specific application, and covering all the necessary conceptual and mathematical tools to do so.

...read moreread less

Abstract: Kernel methods provide a powerful and unified framework for pattern discovery, motivating algorithms that can act on general types of data (e.g. strings, vectors or text) and look for general types of relations (e.g. rankings, classifications, regressions, clusters). The application areas range from neural networks and pattern recognition to machine learning and data mining. This book, developed from lectures and tutorials, fulfils two major roles: firstly it provides practitioners with a large toolkit of algorithms, kernels and solutions ready to use for standard pattern discovery problems in fields such as bioinformatics, text analysis, image analysis. Secondly it provides an easy introduction for students and researchers to the growing field of kernel-based pattern analysis, demonstrating with examples how to handcraft an algorithm or a kernel for a new specific application, and covering all the necessary conceptual and mathematical tools to do so.

...read moreread less

6,050 citations

Cites methods from "Support-Vector Networks"

...The use of slack variables for noise tolerance, tracing back to [13] and further to [129], was introduced to the SVM algorithm in 1995 in the paper of Cortes and Vapnik [27]....
[...]

Book•

Pattern recognition and neural networks

[...]

Brian D. Ripley¹, N. L. Hjort•Institutions (1)

University of Oxford¹

01 Jan 1996

TL;DR: Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks in this self-contained account.

...read moreread less

Abstract: From the Publisher: Pattern recognition has long been studied in relation to many different (and mainly unrelated) applications, such as remote sensing, computer vision, space research, and medical imaging. In this book Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks. Unifying principles are brought to the fore, and the author gives an overview of the state of the subject. Many examples are included to illustrate real problems in pattern recognition and how to overcome them.This is a self-contained account, ideal both as an introduction for non-specialists readers, and also as a handbook for the more expert reader.

...read moreread less

5,632 citations

Book Chapter•DOI•

Large-Scale Machine Learning with Stochastic Gradient Descent

[...]

Léon Bottou

01 Jan 2010

TL;DR: A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems.

...read moreread less

Abstract: During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems. The large-scale case involves the computational complexity of the underlying optimization algorithm in non-trivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for large-scale problems. In particular, second order stochastic gradient and averaged stochastic gradient are asymptotically efficient after a single pass on the training set.

...read moreread less

5,561 citations

Journal Article•DOI•

An overview of statistical learning theory

[...]

Vladimir Vapnik¹•Institutions (1)

AT&T Labs¹

01 Sep 1999-IEEE Transactions on Neural Networks

TL;DR: How the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms are demonstrated and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems are demonstrated.

...read moreread less

Abstract: Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems.

...read moreread less

5,370 citations

Active Learning Literature Survey

[...]

Burr Settles

01 Jan 2009

TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.

...read moreread less

Abstract: The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns. An active learner may pose queries, usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant or easily obtained, but labels are difficult, time-consuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for successful active learning, a summary of problem setting variants and practical issues, and a discussion of related topics in machine learning research are also presented.

...read moreread less

5,227 citations

…
1
2
3
4
5
6
7
…
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Learning representations by back-propagating errors

[...]

David E. Rumelhart¹, Geoffrey E. Hinton², Ronald J. Williams¹•Institutions (2)

University of California, San Diego¹, Carnegie Mellon University²

01 Jan 1988-Nature

TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.

...read moreread less

Abstract: We describe a new learning procedure, back-propagation, for networks of neurone-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal ‘hidden’ units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure1.

...read moreread less

23,814 citations

Book Chapter•DOI•

Learning internal representations by error propagation

[...]

David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams

01 Jan 1988

TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.

...read moreread less

Abstract: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion

...read moreread less

17,604 citations

Journal Article•DOI•

The use of multiple measurements in taxonomic problems

[...]

R. A. Fisher

01 Sep 1936-Annals of Human Genetics

14,009 citations

"Support-Vector Networks" refers background in this paper

...More than 60 years ago R.A. Fisher (Fisher, 1936) suggested the first algorithm for pattern recognition....
[...]

Proceedings Article•DOI•

A training algorithm for optimal margin classifiers

[...]

Bernhard E. Boser¹, Isabelle Guyon², Vladimir Vapnik²•Institutions (2)

University of California, Berkeley¹, Bell Labs²

01 Jul 1992

TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.

...read moreread less

Abstract: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.

...read moreread less

11,211 citations

Book•

Methods of Mathematical Physics

[...]

Richard Courant, David Hilbert

01 Jan 1947

TL;DR: In this paper, the authors present an algebraic extension of LINEAR TRANSFORMATIONS and QUADRATIC FORMS, and apply it to EIGEN-VARIATIONS.

...read moreread less

Abstract: Partial table of contents: THE ALGEBRA OF LINEAR TRANSFORMATIONS AND QUADRATIC FORMS. Transformation to Principal Axes of Quadratic and Hermitian Forms. Minimum-Maximum Property of Eigenvalues. SERIES EXPANSION OF ARBITRARY FUNCTIONS. Orthogonal Systems of Functions. Measure of Independence and Dimension Number. Fourier Series. Legendre Polynomials. LINEAR INTEGRAL EQUATIONS. The Expansion Theorem and Its Applications. Neumann Series and the Reciprocal Kernel. The Fredholm Formulas. THE CALCULUS OF VARIATIONS. Direct Solutions. The Euler Equations. VIBRATION AND EIGENVALUE PROBLEMS. Systems of a Finite Number of Degrees of Freedom. The Vibrating String. The Vibrating Membrane. Green's Function (Influence Function) and Reduction of Differential Equations to Integral Equations. APPLICATION OF THE CALCULUS OF VARIATIONS TO EIGENVALUE PROBLEMS. Completeness and Expansion Theorems. Nodes of Eigenfunctions. SPECIAL FUNCTIONS DEFINED BY EIGENVALUE PROBLEMS. Bessel Functions. Asymptotic Expansions. Additional Bibliography. Index.

...read moreread less

7,426 citations