scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Support-Vector Networks

15 Sep 1995-Machine Learning (Kluwer Academic Publishers)-Vol. 20, Iss: 3, pp 273-297
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
01 Apr 2012
TL;DR: ELM provides a unified learning platform with a widespread type of feature mappings and can be applied in regression and multiclass classification applications directly and in theory, ELM can approximate any target continuous function and classify any disjoint regions.
Abstract: Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built. ELM works for the “generalized” single-hidden-layer feedforward networks (SLFNs), but the hidden layer (or called feature mapping) in ELM need not be tuned. Such SLFNs include but are not limited to SVM, polynomial network, and the conventional feedforward neural networks. This paper shows the following: 1) ELM provides a unified learning platform with a widespread type of feature mappings and can be applied in regression and multiclass classification applications directly; 2) from the optimization method point of view, ELM has milder optimization constraints compared to LS-SVM and PSVM; 3) in theory, compared to ELM, LS-SVM and PSVM achieve suboptimal solutions and require higher computational complexity; and 4) in theory, ELM can approximate any target continuous function and classify any disjoint regions. As verified by the simulation results, ELM tends to have better scalability and achieve similar (for regression and binary class cases) or much better (for multiclass cases) generalization performance at much faster learning speed (up to thousands times) than traditional SVM and LS-SVM.

4,835 citations


Cites background or methods from "Support-Vector Networks"

  • ...Index Terms—Extreme learning machine (ELM), feature mapping, kernel, least square support vector machine (LS-SVM), proximal support vector machine (PSVM), regularization network....

    [...]

  • ...This section briefs the conventional SVM [1] and its variants, namely, LS-SVM [2] and PSVM [3]....

    [...]

Book
13 May 2011
TL;DR: The amount of data in the authors' world has been exploding, and analyzing large data sets will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, according to research by MGI and McKinsey.
Abstract: The amount of data in our world has been exploding, and analyzing large data sets—so-called big data— will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, according to research by MGI and McKinsey's Business Technology Office. Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers. The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future.

4,700 citations

Book
12 Aug 2008
TL;DR: This book explains the principles that make support vector machines (SVMs) a successful modelling and prediction tool for a variety of applications and provides a unique in-depth treatment of both fundamental and recent material on SVMs that so far has been scattered in the literature.
Abstract: This book explains the principles that make support vector machines (SVMs) a successful modelling and prediction tool for a variety of applications. The authors present the basic ideas of SVMs together with the latest developments and current research questions in a unified style. They identify three reasons for the success of SVMs: their ability to learn well with only a very small number of free parameters, their robustness against several types of model violations and outliers, and their computational efficiency compared to several other methods. Since their appearance in the early nineties, support vector machines and related kernel-based methods have been successfully applied in diverse fields of application such as bioinformatics, fraud detection, construction of insurance tariffs, direct marketing, and data and text mining. As a consequence, SVMs now play an important role in statistical machine learning and are used not only by statisticians, mathematicians, and computer scientists, but also by engineers and data analysts. The book provides a unique in-depth treatment of both fundamental and recent material on SVMs that so far has been scattered in the literature. The book can thus serve as both a basis for graduate courses and an introduction for statisticians, mathematicians, and computer scientists. It further provides a valuable reference for researchers working in the field. The book covers all important topics concerning support vector machines such as: loss functions and their role in the learning process; reproducing kernel Hilbert spaces and their properties; a thorough statistical analysis that uses both traditional uniform bounds and more advanced localized techniques based on Rademacher averages and Talagrand's inequality; a detailed treatment of classification and regression; a detailed robustness analysis; and a description of some of the most recent implementation techniques. To make the book self-contained, an extensive appendix is added which provides the reader with the necessary background from statistics, probability theory, functional analysis, convex analysis, and topology.

4,664 citations


Cites background or methods from "Support-Vector Networks"

  • ...The dot product can be replaced by a generalised dot product K(x, y) with any function which satisfies Mercer’s theorem (see [5] for more details on Mercer’s theorem)....

    [...]

  • ...To deal with the non separable case an error tolerance can be introduced as described by Cortes and Vapnik, [5]....

    [...]

  • ...The work of Boser et al, [8], was extended in 1995 by Cortes and Vapnik, [5], to include the handling of non-separable classes....

    [...]

  • ...Vapnik and Cortes, [5], show that to find the optimal set of weights, Given, Λo = (α1, α2 ....

    [...]

  • ...Vapnik and Cortes, [5], show that to find the optimal set of weights, Given, ΛTo = (α1, α2 . . . , αl) (16) it is necessary to solve the following quadratic problem: W (Λ) = ΛT 1− 1 2 ΛT DΛ (17) Subject to the following constraints Λ ≥ 0, (18) ΛT Y = 0, (19) where 1T = (1, 2, . . . , n) is an n-dimensional unit vector, Y T = (y1, y2, . . . , yn) is the n-dimensional vector of labels, and D is a symmetrical n ∗nmatrix of dot product of the training vectors, multiplied by their labels....

    [...]

Proceedings ArticleDOI
23 Jul 2002
TL;DR: The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking.
Abstract: This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.

4,453 citations


Cites background from "Support-Vector Networks"

  • ...However, just like in classification SVMs [7], it is possible to approximate the solution by introducing (non-negative) slack variables ξi,j,k and minimizing the upper bound ∑ ξi,j,k. Adding SVM regularization for margin maximization to the objective leads to the following optimization problem, which is similar to the ordinal regression approach in [12]....

    [...]

  • ...However, just like in classification SVMs [7], it is possible to approximate the solution by introducing (non-negative) slack variables ξi,j,k and minimizing the upper bound ∑ ξi,j,k....

    [...]

Book
01 Jan 2015
TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.
Abstract: Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.

3,857 citations

References
More filters
Journal ArticleDOI
01 Jan 1988-Nature
TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
Abstract: We describe a new learning procedure, back-propagation, for networks of neurone-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal ‘hidden’ units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure1.

23,814 citations

Book ChapterDOI
01 Jan 1988
TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.
Abstract: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion

17,604 citations

Journal ArticleDOI

14,009 citations


"Support-Vector Networks" refers background in this paper

  • ...More than 60 years ago R.A. Fisher (Fisher, 1936) suggested the first algorithm for pattern recognition....

    [...]

Proceedings ArticleDOI
01 Jul 1992
TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.
Abstract: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.

11,211 citations

Book
01 Jan 1947
TL;DR: In this paper, the authors present an algebraic extension of LINEAR TRANSFORMATIONS and QUADRATIC FORMS, and apply it to EIGEN-VARIATIONS.
Abstract: Partial table of contents: THE ALGEBRA OF LINEAR TRANSFORMATIONS AND QUADRATIC FORMS. Transformation to Principal Axes of Quadratic and Hermitian Forms. Minimum-Maximum Property of Eigenvalues. SERIES EXPANSION OF ARBITRARY FUNCTIONS. Orthogonal Systems of Functions. Measure of Independence and Dimension Number. Fourier Series. Legendre Polynomials. LINEAR INTEGRAL EQUATIONS. The Expansion Theorem and Its Applications. Neumann Series and the Reciprocal Kernel. The Fredholm Formulas. THE CALCULUS OF VARIATIONS. Direct Solutions. The Euler Equations. VIBRATION AND EIGENVALUE PROBLEMS. Systems of a Finite Number of Degrees of Freedom. The Vibrating String. The Vibrating Membrane. Green's Function (Influence Function) and Reduction of Differential Equations to Integral Equations. APPLICATION OF THE CALCULUS OF VARIATIONS TO EIGENVALUE PROBLEMS. Completeness and Expansion Theorems. Nodes of Eigenfunctions. SPECIAL FUNCTIONS DEFINED BY EIGENVALUE PROBLEMS. Bessel Functions. Asymptotic Expansions. Additional Bibliography. Index.

7,426 citations