scispace - formally typeset
Search or ask a question

Showing papers on "Support vector machine published in 1999"


Journal ArticleDOI
TL;DR: A least squares version for support vector machine (SVM) classifiers that follows from solving a set of linear equations, instead of quadratic programming for classical SVM's.
Abstract: In this letter we discuss a least squares version for support vector machine (SVM) classifiers. Due to equality type constraints in the formulation, the solution follows from solving a set of linear equations, instead of quadratic programming for classical SVM‘s. The approach is illustrated on a two-spiral benchmark classification problem.

8,811 citations


Proceedings ArticleDOI
08 Feb 1999
TL;DR: Support vector machines for dynamic reconstruction of a chaotic system, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel.
Abstract: Introduction to support vector learning roadmap. Part 1 Theory: three remarks on the support vector method of function estimation, Vladimir Vapnik generalization performance of support vector machines and other pattern classifiers, Peter Bartlett and John Shawe-Taylor Bayesian voting schemes and large margin classifiers, Nello Cristianini and John Shawe-Taylor support vector machines, reproducing kernel Hilbert spaces, and randomized GACV, Grace Wahba geometry and invariance in kernel based methods, Christopher J.C. Burges on the annealed VC entropy for margin classifiers - a statistical mechanics study, Manfred Opper entropy numbers, operators and support vector kernels, Robert C. Williamson et al. Part 2 Implementations: solving the quadratic programming problem arising in support vector classification, Linda Kaufman making large-scale support vector machine learning practical, Thorsten Joachims fast training of support vector machines using sequential minimal optimization, John C. Platt. Part 3 Applications: support vector machines for dynamic reconstruction of a chaotic system, Davide Mattera and Simon Haykin using support vector machines for time series prediction, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel. Part 4 Extensions of the algorithm: reducing the run-time complexity in support vector machines, Edgar E. Osuna and Federico Girosi support vector regression with ANOVA decomposition kernels, Mark O. Stitson et al support vector density estimation, Jason Weston et al combining support vector and mathematical programming methods for classification, Bernhard Scholkopf et al.

5,506 citations


01 Jan 1999
TL;DR: SMO breaks this large quadratic programming problem into a series of smallest possible QP problems, which avoids using a time-consuming numerical QP optimization as an inner loop and hence SMO is fastest for linear SVMs and sparse data sets.

5,350 citations


Book
John Platt1
08 Feb 1999
TL;DR: In this article, the authors proposed a new algorithm for training Support Vector Machines (SVM) called SMO (Sequential Minimal Optimization), which breaks this large QP problem into a series of smallest possible QP problems.
Abstract: This chapter describes a new algorithm for training Support Vector Machines: Sequential Minimal Optimization, or SMO Training a Support Vector Machine (SVM) requires the solution of a very large quadratic programming (QP) optimization problem SMO breaks this large QP problem into a series of smallest possible QP problems These small QP problems are solved analytically, which avoids using a time-consuming numerical QP optimization as an inner loop The amount of memory required for SMO is linear in the training set size, which allows SMO to handle very large training sets Because large matrix computation is avoided, SMO scales somewhere between linear and quadratic in the training set size for various test problems, while a standard projected conjugate gradient (PCG) chunking algorithm scales somewhere between linear and cubic in the training set size SMO's computation time is dominated by SVM evaluation, hence SMO is fastest for linear SVMs and sparse data sets For the MNIST database, SMO is as fast as PCG chunking; while for the UCI Adult database and linear SVMs, SMO can be more than 1000 times faster than the PCG chunking algorithm

5,019 citations



Posted ContentDOI
TL;DR: SVM light as discussed by the authors is an implementation of an SVM learner which addresses the problem of large-scale SVM training with many training examples on the shelf, which makes large scale SVM learning more practical.
Abstract: Training a support vector machine SVM leads to a quadratic optimization problem with bound constraints and one linear equality constraint Despite the fact that this type of problem is well understood, there are many issues to be considered in designing an SVM learner In particular, for large learning tasks with many training examples on the shelf optimization techniques for general quadratic programs quickly become intractable in their memory and time requirements SVM light is an implementation of an SVM learner which addresses the problem of large tasks This chapter presents algorithmic and computational results developed for SVM light V 20, which make large-scale SVM training more practical The results give guidelines for the application of SVMs to large domains

4,145 citations


Proceedings Article
27 Jun 1999
TL;DR: An analysis of why Transductive Support Vector Machines are well suited for text classi cation is presented, and an algorithm for training TSVMs, handling 10,000 examples and more is proposed.
Abstract: This paper introduces Transductive Support Vector Machines (TSVMs) for text classi cation. While regular Support Vector Machines (SVMs) try to induce a general decision function for a learning task, Transductive Support Vector Machines take into account a particular test set and try to minimize misclassi cations of just those particular examples. The paper presents an analysis of why TSVMs are well suited for text classi cation. These theoretical ndings are supported by experiments on three test collections. The experiments show substantial improvements over inductive methods, especially for small training sets, cutting the number of labeled training examples down to a twentieth on some tasks. This work also proposes an algorithm for training TSVMs e ciently, handling 10,000 examples and more.

3,091 citations


Proceedings ArticleDOI
23 Aug 1999
TL;DR: In this article, a non-linear classification technique based on Fisher's discriminant is proposed and the main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space.
Abstract: A non-linear classification technique based on Fisher's discriminant is proposed. The main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space. The linear classification in feature space corresponds to a (powerful) non-linear decision function in input space. Large scale simulations demonstrate the competitiveness of our approach.

2,896 citations


Proceedings Article
29 Nov 1999
TL;DR: The algorithm is a natural extension of the support vector algorithm to the case of unlabelled data and is regularized by controlling the length of the weight vector in an associated feature space.
Abstract: Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified ν between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. We provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled data.

1,851 citations


Journal ArticleDOI
TL;DR: This paper shows the use of a data domain description method, inspired by the support vector machine by Vapnik, called the support vectors domain description (SVDD), which can be used for novelty or outlier detection and is compared with other outlier Detection methods on real data.

1,581 citations


Journal ArticleDOI
TL;DR: The use of support vector machines in classifying e-mail as spam or nonspam is studied by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees, which found SVM's performed best when using binary features.
Abstract: We study the use of support vector machines (SVM) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000. SVM performed best when using binary features. For both data sets, boosting trees and SVM had acceptable test performance in terms of accuracy and speed. However, SVM had significantly less training time.

Journal ArticleDOI
TL;DR: It is observed that a simple remapping of the input x(i)-->x(i)(a) improves the performance of linear SVM's to such an extend that it makes them, for this problem, a valid alternative to RBF kernels.
Abstract: Traditional classification approaches generalize poorly on image classification tasks, because of the high dimensionality of the feature space. This paper shows that support vector machines (SVM) can generalize well on difficult image classification problems where the only features are high dimensional histograms. Heavy-tailed RBF kernels of the form K(x, y)=e/sup -/spl rho///spl Sigma//sub i//sup |xia-yia|b/ with a /spl les/1 and b/spl les/2 are evaluated on the classification of images extracted from the Corel stock photo collection and shown to far outperform traditional polynomial or Gaussian radial basis function (RBF) kernels. Moreover, we observed that a simple remapping of the input x/sub i//spl rarr/x/sub i//sup a/ improves the performance of linear SVM to such an extend that it makes them, for this problem, a valid alternative to RBF kernels.

Book
08 Feb 1999
TL;DR: This chapter presents algorithmic and computational results developed for SV M light V2.0, which make large-scale SVM training more practical and give guidelines for the application of SVMs to large domains.
Abstract: Training a support vector machine (SVM) leads to a quadratic optimization problem with bound constraints and one linear equality constraint. Despite the fact that this type of problem is well understood, there are many issues to be considered in designing an SVM learner. In particular, for large learning tasks with many training examples, oo-the-shelf optimization techniques for general quadratic programs quickly become intractable in their memory and time requirements. SV M light1 is an implementation of an SVM learner which addresses the problem of large tasks. This chapter presents algorithmic and computational results developed for SV M light V2.0, which make large-scale SVM training more practical. The results give guidelines for the application of SVMs to large domains.


Book
Shigeo Abe1
26 Oct 1999
TL;DR: This book presents architectures for multiclass classification and function approximation problems, as well as evaluation criteria for classifiers and regressors, and discusses kernel methods for improving the generalization ability of neural networks and fuzzy systems.
Abstract: A guide on the use of SVMs in pattern classification, including a rigorous performance comparison of classifiers and regressors. The book presents architectures for multiclass classification and function approximation problems, as well as evaluation criteria for classifiers and regressors. Features: Clarifies the characteristics of two-class SVMs; Discusses kernel methods for improving the generalization ability of neural networks and fuzzy systems; Contains ample illustrations and examples; Includes performance evaluation using publicly available data sets; Examines Mahalanobis kernels, empirical feature space, and the effect of model selection by cross-validation; Covers sparse SVMs, learning using privileged information, semi-supervised learning, multiple classifier systems, and multiple kernel learning; Explores incremental training based batch training and active-set training methods, and decomposition techniques for linear programming SVMs; Discusses variable selection for support vector regressors.

Proceedings Article
Michael E. Tipping1
29 Nov 1999
TL;DR: The Relevance Vector Machine is introduced, a Bayesian treatment of a generalised linear model of identical functional form to the SVM, and examples demonstrate that for comparable generalisation performance, the RVM requires dramatically fewer kernel functions.
Abstract: The support vector machine (SVM) is a state-of-the-art technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement to estimate a trade-off parameter and the need to utilise 'Mercer' kernel functions. In this paper we introduce the Relevance Vector Machine (RVM), a Bayesian treatment of a generalised linear model of identical functional form to the SVM. The RVM suffers from none of the above disadvantages, and examples demonstrate that for comparable generalisation performance, the RVM requires dramatically fewer kernel functions.

Journal ArticleDOI
TL;DR: Simulation results for both artificial and real data show remarkable improvement of generalization errors, supporting the idea of modifying a kernel function to enlarge the spatial resolution around the separating boundary surface by a conformal mapping, such that the separability between classes is increased.

Proceedings Article
01 Jan 1999
TL;DR: A formulation of the SVM is proposed that enables a multi-class pattern recognition problem to be solved in a single optimisation and a similar generalization of linear programming machines is proposed.
Abstract: The solution of binary classi cation problems using support vector machines (SVMs) is well developed, but multi-class problems with more than two classes have typically been solved by combining independently produced binary classi ers. We propose a formulation of the SVM that enables a multi-class pattern recognition problem to be solved in a single optimisation. We also propose a similar generalization of linear programming machines. We report experiments using bench-mark datasets in which these two methods achieve a reduction in the number of support vectors and kernel calculations needed. 1. k-Class Pattern Recognition The k-class pattern recognition problem is to construct a decision function given ` iid (independent and identically distributed) samples (points) of an unknown function, typically with noise: (x1; y1); : : : ; (x`; y`) (1) where xi; i = 1; : : : ; ` is a vector of length d and yi 2 f1; : : : ; kg represents the class of the sample. A natural loss function is the number of mistakes made. 2. Solving k-Class Problems with Binary SVMs For the binary pattern recognition problem (case k = 2), the support vector approach has been well developed [3, 5]. The classical approach to solving k-class pattern recognition problems is to consider the problem as a collection of binary classi cation problems. In the one-versus-rest method one constructs k classi ers, one for each class. The n classi er constructs a hyperplane between class n and the k 1 other classes. A particular point is assigned to the class for which the distance from the margin, in the positive direction (i.e. in the direction in which class \one" lies rather than class \rest"), is maximal. This method has been used widely in ESANN'1999 proceedings European Symposium on Artificial Neural Networks Bruges (Belgium), 21-23 April 1999, D-Facto public., ISBN 2-600049-9-X, pp. 219-224

01 Jan 1999
TL;DR: Two schemes for adjusting the sensitivity and speciicity of Support Vector Machines and the description of their performance using receiver operating characteristic (ROC) curves are discussed and their use on real-life medical diagnostic tasks is illustrated.
Abstract: For many applications it is important to accurately distinguish false negative results from false positives. This is particularly important for medical diagnosis where the correct balance between sensitivity and speciicity plays an important role in evaluating the performance of a classiier. In this paper we discuss two schemes for adjusting the sensitivity and speciicity of Support Vector Machines and the description of their performance using receiver operating characteristic (ROC) curves. We then illustrate their use on real-life medical diagnostic tasks.



Proceedings ArticleDOI
01 Jan 1999
TL;DR: Experimental results indicate that the presented algorithm outperforms more naive approaches to ordinal regression such as support vector classification and support vector regression in the case of more than two ranks.
Abstract: We investigate the problem of predicting variables of ordinal scale. This task is referred to as ordinal regression and is complementary to the standard machine learning tasks of classification and metric regression. In contrast to statistical models we present a distribution independent formulation of the problem together with uniform bounds of the risk functional. The approach presented is based on a mapping from objects to scalar utility values. Similar to support vector methods we derive a new learning algorithm for the task of ordinal regression based on large margin rank boundaries. We give experimental results for an information retrieval task: learning the order of documents with respect to an initial query. Experimental results indicate that the presented algorithm outperforms more naive approaches to ordinal regression such as support vector classification and support vector regression in the case of more than two ranks.

Journal ArticleDOI
TL;DR: Successive overrelaxation for symmetric linear complementarity problems and quadratic programs is used to train a support vector machine (SVM) for discriminating between the elements of two massive datasets, each with millions of points.
Abstract: Successive overrelaxation (SOR) for symmetric linear complementarity problems and quadratic programs is used to train a support vector machine (SVM) for discriminating between the elements of two massive datasets, each with millions of points. Because SOR handles one point at a time, similar to Platt's sequential minimal optimization (SMO) algorithm (1999) which handles two constraints at a time and Joachims' SVM/sup light/ (1998) which handles a small number of points at a time, SOR can process very large datasets that need not reside in memory. The algorithm converges linearly to a solution. Encouraging numerical results are presented on datasets with up to 10 000 000 points. Such massive discrimination problems cannot be processed by conventional linear or quadratic programming methods, and to our knowledge have not been solved by other methods. On smaller problems, SOR was faster than SVM/sup light/ and comparable or faster than SMO.

Journal ArticleDOI
TL;DR: This work investigates how two-class discrimination methods can be extended to the multiclass case, and shows how the linear programming (LP) and quadratic programming (QP) approaches based on Vapnik's Support Vector Machine (SVM) can be combined to yield two new approaches to theMulticlass problem.
Abstract: We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how two-class discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programming (QP) approaches based on Vapnik‘s Support Vector Machine (SVM) can be combined to yield two new approaches to the multiclass problem. In LP multiclass discrimination, a single linear program is used to construct a piecewise-linear classification function. In our proposed multiclass SVM method, a single quadratic program is used to construct a piecewise-nonlinear classification function. Each piece of this function can take the form of a polynomial, a radial basis function, or even a neural network. For the k > 2-class problems, the SVM method as originally proposed required the construction of a two-class SVM to separate each class from the remaining classes. Similarily, k two-class linear programs can be used for the multiclass problem. We performed an empirical study of the original LP method, the proposed k LP method, the proposed single QP method and the original k QP methods. We discuss the advantages and disadvantages of each approach.

Journal ArticleDOI
17 Oct 1999
TL;DR: This work provides a novel algorithmic analysis via a model of robust concept learning (closely related to “margin classifiers”), and shows that a relatively small number of examples are sufficient to learn rich concept classes.
Abstract: We study the phenomenon of cognitive learning from an algorithmic standpoint. How does the brain effectively learn concepts from a small number of examples despite the fact that each example contains a huge amount of information? We provide a novel analysis for a model of robust concept learning (closely related to "margin classifiers"), and show that a relatively small number of examples are sufficient to learn rich concept classes (including threshold functions, Boolean formulae and polynomial surfaces). As a result, we obtain simple intuitive proofs for the generalization bounds of Support Vector Machines. In addition, the new algorithm has several advantages-they are faster conceptually simpler and highly resistant to noise. For example, a robust half-space can be PAC-learned in linear time using only a constant number of training examples, regardless of the number of attributes. A general (algorithmic) consequence of the model, that "more robust concepts are easier to learn", is supported by a multitude of psychological studies.

Proceedings Article
29 Nov 1999
TL;DR: New functionals for parameter (model) selection of Support Vector Machines are introduced based on the concepts of the span of support vectors and rescaling of the feature space and it is shown that using these functionals one can both predict the best choice of parameters of the model and the relative quality of performance for any value of parameter.
Abstract: New functionals for parameter (model) selection of Support Vector Machines are introduced based on the concepts of the span of support vectors and rescaling of the feature space. It is shown that using these functionals, one can both predict the best choice of parameters of the model and the relative quality of performance for any value of parameter.

Proceedings ArticleDOI
29 Jan 1999
TL;DR: The Support Vector Machine (SVM) as discussed by the authors is a new way to design classification algorithms which learn from examples (supervised learning) and generalize when applied to new data.
Abstract: The Support Vector Machine provides a new way to design classification algorithms which learn from examples (supervised learning) and generalize when applied to new data. We demonstrate its success on a difficult classification problem from hyperspectral remote sensing, where we obtain performances of 96%, and 87% correct for a 4 class problem, and a 16 class problem respectively. These results are somewhat better than other recent results on the same data. A key feature of this classifier is its ability to use high-dimensional data without the usual recourse to a feature selection step to reduce the dimensionality of the data. For this application, this is important, as hyperspectral data consists of several hundred contiguous spectral channels for each exemplar. We provide an introduction to this new approach, and demonstrate its application to classification of an agriculture scene.

Journal ArticleDOI
TL;DR: This work proposes to evaluate different binary classification schemes (support vector machine, multilayer perceptron, C4.5 decision tree, Fisher's linear discriminant, Bayesian classifier) to carry on the fusion of experts for taking a final decision on identity authentication.
Abstract: Biometric person identity authentication is gaining more and more attention. The authentication task performed by an expert is a binary classification problem: reject or accept identity claim. Combining experts, each based on a different modality (speech, face, fingerprint, etc.), increases the performance and robustness of identity authentication systems. In this context, a key issue is the fusion of the different experts for taking a final decision (i.e., accept or reject identity claim). We propose to evaluate different binary classification schemes (support vector machine, multilayer perceptron, C4.5 decision tree, Fisher's linear discriminant, Bayesian classifier) to carry on the fusion. The experimental results show that support vector machines and Bayesian classifier achieve almost the same performances, and both outperform the other evaluated classifiers.

Proceedings Article
29 Nov 1999
TL;DR: Experimental results on commonly used benchmark data sets of a wide range of face images show that the SNoW-based approach outperforms methods that use neural networks, Bayesian methods, support vector machines and others.
Abstract: A novel learning approach for human face detection using a network of linear units is presented. The SNoW learning architecture is a sparse network of linear functions over a pre-defined or incrementally learned feature space and is specifically tailored for learning in the presence of a very large number of features. A wide range of face images in different poses, with different expressions and under different lighting conditions are used as a training set to capture the variations of human faces. Experimental results on commonly used benchmark data sets of a wide range of face images show that the SNoW-based approach outperforms methods that use neural networks, Bayesian methods, support vector machines and others. Furthermore, learning and evaluation using the SNoW-based method are significantly more efficient than with other methods.