scispace - formally typeset
Search or ask a question

Showing papers on "Sequential minimal optimization published in 2006"


Journal ArticleDOI
TL;DR: The purpose of this paper is to present and compare these implementations of support vector machines, among the most popular and efficient classification and regression methods currently available.
Abstract: Being among the most popular and efficient classification and regression methods currently available, implementations of support vector machines exist in almost every popular programming language. Currently four R packages contain SVM related software. The purpose of this paper is to present and compare these implementations. (authors' abstract)

576 citations


Journal ArticleDOI
TL;DR: The main results include a simple asymptotic convergence proof, a general explanation of the shrinking and caching techniques, and the linear convergence of the methods.
Abstract: Decomposition methods are currently one of the major methods for training support vector machines. They vary mainly according to different working set selections. Existing implementations and analysis usually consider some specific selection rules. This paper studies sequential minimal optimization type decomposition methods under a general and flexible way of choosing the two-element working set. The main results include: 1) a simple asymptotic convergence proof, 2) a general explanation of the shrinking and caching techniques, and 3) the linear convergence of the methods. Extensions to some support vector machine variants are also discussed.

302 citations


Journal ArticleDOI
TL;DR: The parallel SMO is developed using message passing interface (MPI) and shows great speedup on the adult data set and the Mixing National Institute of Standard and Technology (MNIST) data set when many processors are used.
Abstract: Sequential minimal optimization (SMO) is one popular algorithm for training support vector machine (SVM), but it still requires a large amount of computation time for solving large size problems. This paper proposes one parallel implementation of SMO for training SVM. The parallel SMO is developed using message passing interface (MPI). Specifically, the parallel SMO first partitions the entire training data set into smaller subsets and then simultaneously runs multiple CPU processors to deal with each of the partitioned data sets. Experiments show that there is great speedup on the adult data set and the Mixing National Institute of Standard and Technology (MNIST) data set when many processors are used. There are also satisfactory results on the Web data set.

170 citations


Journal ArticleDOI
TL;DR: A novel method for the prediction of catalytic sites, using a carefully selected, supervised machine learning algorithm coupled with an optimal discriminative set of protein sequence conservation and structural properties is presented.
Abstract: The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for functional prediction. Knowledge of catalytic sites provides a valuable insight into protein function. Although many computational methods have been developed to predict catalytic residues and active sites, their accuracy remains low, with a significant number of false positives. In this paper, we present a novel method for the prediction of catalytic sites, using a carefully selected, supervised machine learning algorithm coupled with an optimal discriminative set of protein sequence conservation and structural properties. To determine the best machine learning algorithm, 26 classifiers in the WEKA software package were compared using a benchmarking dataset of 79 enzymes with 254 catalytic residues in a 10-fold cross-validation analysis. Each residue of the dataset was represented by a set of 24 residue properties previously shown to be of functional relevance, as well as a label {+1/-1} to indicate catalytic/non-catalytic residue. The best-performing algorithm was the Sequential Minimal Optimization (SMO) algorithm, which is a Support Vector Machine (SVM). The Wrapper Subset Selection algorithm further selected seven of the 24 attributes as an optimal subset of residue properties, with sequence conservation, catalytic propensities of amino acids, and relative position on protein surface being the most important features. The SMO algorithm with 7 selected attributes correctly predicted 228 of the 254 catalytic residues, with an overall predictive accuracy of more than 86%. Missing only 10.2% of the catalytic residues, the method captures the fundamental features of catalytic residues and can be used as a "catalytic residue filter" to facilitate experimental identification of catalytic residues for proteins with known structure but unknown function.

120 citations


Journal ArticleDOI
TL;DR: A mechanism for selecting adequate training parameters makes the classification procedure fast and effective in the detection and classification of rolling-element bearing faults.

90 citations


Proceedings ArticleDOI
25 Jun 2006
TL;DR: This article proposes a method for combining multiple kernels in a nonstationary fashion using a large-margin latent-variable generative model within the maximum entropy discrimination (MED) framework, and shows that the support vector machine is a special case of this model.
Abstract: The power and popularity of kernel methods stem in part from their ability to handle diverse forms of structured inputs, including vectors, graphs and strings. Recently, several methods have been proposed for combining kernels from heterogeneous data sources. However, all of these methods produce stationary combinations; i.e., the relative weights of the various kernels do not vary among input examples. This article proposes a method for combining multiple kernels in a nonstationary fashion. The approach uses a large-margin latent-variable generative model within the maximum entropy discrimination (MED) framework. Latent parameter estimation is rendered tractable by variational bounds and an iterative optimization procedure. The classifier we use is a log-ratio of Gaussian mixtures, in which each component is implicitly mapped via a Mercer kernel function. We show that the support vector machine is a special case of this model. In this approach, discriminative parameter estimation is feasible via a fast sequential minimal optimization algorithm. Empirical results are presented on synthetic data, several benchmarks, and on a protein function annotation task.

89 citations


Journal ArticleDOI
TL;DR: A different reduced training set is selected to re-train LS-SVM and a new procedure is proposed to obtain the sparseness and the results indicate that it is more effective.

51 citations


Proceedings Article
04 Dec 2006
TL;DR: A modified version of SVM is presented that allows the user to set a budget parameter B and focuses on minimizing the loss attained by the B worst-classified examples while ignoring the remaining examples.
Abstract: The standard Support Vector Machine formulation does not provide its user with the ability to explicitly control the number of support vectors used to define the generated classifier. We present a modified version of SVM that allows the user to set a budget parameter B and focuses on minimizing the loss attained by the B worst-classified examples while ignoring the remaining examples. This idea can be used to derive sparse versions of both L1-SVM and L2-SVM. Technically, we obtain these new SVM variants by replacing the 1-norm in the standard SVM formulation with various interpolation-norms. We also adapt the SMO optimization algorithm to our setting and report on some preliminary experimental results.

49 citations


Journal ArticleDOI
TL;DR: By computer simulations for two-class and multiclass benchmark data sets, it is shown that the proposed incremental training method can delete data considerably without deteriorating the generalization ability.

48 citations


Journal ArticleDOI
TL;DR: A parallel version of sequential minimal optimization (SMO) is developed in this paper for fast training support vector machine (SVM) and shows great speedup on the adult data set, the MNIST data set and IDEVAL data set when many processors are used.

29 citations


01 Jan 2006
TL;DR: This article describes how to efficiently parallelize SVM training in order to cut down execution times and shows, that on most problems linear or even superlinear speedups can be attained.
Abstract: The Support Vector Machine (SVM) is a supervised algorithm for the solution of classification and regression problems SVMs have gained widespread use in recent years because of successful applications like character recognition and the profound theoretical underpinnings concerning generalization performance Yet, one of the remaining drawbacks of the SVM algorithm is its high computational demands during the training and testing phase This article describes how to efficiently parallelize SVM training in order to cut down execution times The parallelization technique employed is based on a decomposition approach, where the inner quadratic program (QP) is solved using Sequential Minimal Optimization (SMO) Thus all types of SVM formulations can be solved in parallel, including C-SVC and ν-SVC for classification as well as e-SVR and ν-SVR for regression Practical results show, that on most problems linear or even superlinear speedups can be attained

01 Jan 2006
TL;DR: Computational results show that the iterative learning method can simplify SVM effectively and can be implemented easily, and this method will help improve the classification speed of SVM.
Abstract: Support vector machines (SVM) are well known to give good results on a wide variety of pattern recognition problems, but for large scale problems, the number of support vectors usually is large, which results in substantially slow classification speed. Existing study has proposed to speed the SVM classification by decreasing the number of support vectors. In this paper it is found that SVM trained with most important training points have less support vectors and equivalent accuracy to those of SVM trained with the full training set. An iterative procedure is proposed to train the simplified SVM with most important training points, and the careful preprocessing on outliers also is used to speed the iterative learning. Computational results indicate that, compared with SVM trained with full training set, proposed method can obtain simplified SVMs with much less support vectors and equivalent classification accuracy, which supports proposed method as an effective method to obtain a simplified SVM for large problems. Keywords—Simplified SVM, Support Vector Machine, Iterative Learning

Proceedings ArticleDOI
01 Aug 2006
TL;DR: This paper introduces the support vector machine in which the training examples are fuzzy input, and gives some solving procedure of the support vectors machine with fuzzy training data.
Abstract: Support vector machines (SVMs) have been very successful in pattern recognition and function estimation problems, but in the support vector machines for classification, the training examples are non-fuzzy input and output is y=plusmn1;. In this paper, we introduce the support vector machine in which the training examples are fuzzy input, and give some solving procedure of the support vector machine with fuzzy training data

Proceedings ArticleDOI
26 Sep 2006
TL;DR: Evolutionary support vector machines (ESVMs) as discussed by the authors are a novel technique that assimilates the learning engine of the state-of-the-art SVM, but evolves the coefficients of the decision function by means of evolutionary algorithms.
Abstract: Evolutionary support vector machines (ESVMs) are a novel technique that assimilates the learning engine of the state-of-the-art support vector machines (SVMs) but evolves the coefficients of the decision function by means of evolutionary algorithms (EAs). The new method has accomplished the purpose for which it has been initially developed, that of a simpler alternative to the canonical SVM approach for solving the optimization component of training. ESVMs, as SVMs, are natural tools for primary application to classification. However, since the latter had been further on extended to also handle regression, it is the scope of this paper to present the corresponding evolutionary paradigm. In particular, we consider the hybridization with the classical epsi-support vector regression (epsi-SVR) introduced by Vapnik and the subsequent evolution of the coefficients of the regression hyperplane. epsi-evolutionary support regression (epsi-ESVR) is validated on the Boston housing benchmark problem and the obtained results demonstrate the promise of ESVMs also as concerns regression

01 Jan 2006
TL;DR: An extensive empirical evaluation of two popular semi-supervised classification algorithms: Transduc- tive Support Vector Machines (TSVM) and Tri-Training.
Abstract: In this paper we present and analyze the methodological approach we have used for addressing the ECML - PKDD Discovery Challenge 2006. The Challenge was concerned with the identification of individual user's spam emails based on a centrally collected training set. The task descriptions of the discovery challenge indicated that we should deviate from the classical supervised clas- sification paradigm and attempt to utilize semi-supervised and transductive ap- proaches. The format of the training data (bag-of-words providing only word IDs), did not allow either for the use of Natural Language Processing (NLP) ap- proaches, or for the use of standard spam-recognition strategies. The submitted model, which achieved 5 th place on Task A of the challenge, was derived by Tri- Training, a recent development in Semi-supervised algorithms research. Given a standard classifier, Tri-Training initially uses bagging to produce three diverse training datasets-classifiers, which are used for classifying the unlabeled data and incorporating them into the training set in a theoretically sound way. The classi- fier we have used within Tri-Training was Support Vector Machines (SVM) and more precisely the Sequential Minimal Optimization (SMO) implementation of WEKA. Moreover, we have used feature normalization and logistic regression models to produce continuous outputs. Apart from a detailed description and a discussion of the submitted model, this paper contains an extensive empirical evaluation of two popular semi-supervised classification algorithms: Transduc- tive Support Vector Machines (TSVM) and Tri-Training.

Proceedings ArticleDOI
Patrick Haffner1
25 Jun 2006
TL;DR: A new method based on transposition is proposed to speedup this computation on sparse data, instead of dot-products over sparse feature vectors, that incrementally merges lists of training examples and minimizes access to the data.
Abstract: Kernel-based learning algorithms, such as Support Vector Machines (SVMs) or Perceptron, often rely on sequential optimization where a few examples are added at each iteration. Updating the kernel matrix usually requires matrix-vector multiplications. We propose a new method based on transposition to speedup this computation on sparse data. Instead of dot-products over sparse feature vectors, our computation incrementally merges lists of training examples and minimizes access to the data. Caching and shrinking are also optimized for sparsity. On very large natural language tasks (tagging, translation, text classification) with sparse feature representations, a 20 to 80-fold speedup over LIBSVM is observed using the same SMO algorithm. Theory and experiments explain what type of sparsity structure is needed for this approach to work, and why its adaptation to Maxent sequential optimization is inefficient.

Book ChapterDOI
03 Oct 2006
TL;DR: A novel sequential minimal optimization algorithm for support vector regression in which convex optimization problems with l variables are solved instead of standard quadratic programming problems with 2l variables where l is the number of training samples.
Abstract: A novel sequential minimal optimization (SMO) algorithm for support vector regression is proposed This algorithm is based on Flake and Lawrence's SMO in which convex optimization problems with l variables are solved instead of standard quadratic programming problems with 2l variables where l is the number of training samples, but the strategy for working set selection is quite different Experimental results show that the proposed algorithm is much faster than Flake and Lawrence's SMO and comparable to the fastest conventional SMO

Proceedings ArticleDOI
18 Dec 2006
TL;DR: Two new support vector approaches for ordinal regression find the concentric spheres with minimum volume that contain most of the training samples and guarantee that the radii of the spheres are properly ordered at the optimal solution.
Abstract: We present two new support vector approaches for ordinal regression. These approaches find the concentric spheres with minimum volume that contain most of the training samples. Both approaches guarantee that the radii of the spheres are properly ordered at the optimal solution. The size of the optimization problem is linear in the number of training samples. The popular SMO algorithm is adapted to solve the resulting optimization problem. Numerical experiments on some real-world data sets verify the usefulness of our approaches for data mining.

Proceedings ArticleDOI
01 Jan 2006
TL;DR: This work proposes an approximate SVM, where a small number of representatives are extracted from the original training data set and used for training, and proposes two efficient implementations of the proposed algorithm, where approximations to kernel k-means are used.
Abstract: We propose to speed up the training process of support vector machines (SVM) by resorting to an approximate SVM, where a small number of representatives are extracted from the original training data set and used for training. Theoretical studies show that, in order for the approximate SVM to be similar to the exact SVM given by the original training data set, kernel k-means should be used to extract the representatives. As practical variations, we also propose two efficient implementations of the proposed algorithm, where approximations to kernel k-means are used. The proposed algorithms are compared against the standard training algorithm over real data sets.

Proceedings ArticleDOI
20 Jun 2006
TL;DR: A parallel multi-class SVM based on sequential minimal optimization (SMO) is proposed in this paper, which combines SMO, parallel technology, DTSVM and cluster and shows that the speeds of training and classifying are improved remarkably.
Abstract: Support Vector Machine (SVM) is originally developed for binary classification problems In order to solve practical multi-class problems, various approaches such as one-against-rest (1-a-r), one-against-one (1-a-1) and decision trees based SVM have been presented The disadvantages of the existing methods of SVM multi-class classification are analyzed and compared in this paper, such as 1-a-r is difficult to train and the classifying speed of 1-a-1 is slow To solve these problems, a parallel multi-class SVM based on Sequential Minimal Optimization (SMO) is proposed in this paper This method combines SMOparallel technologyDTSVM and cluster Experiments have been made on University of California-Irvine (UCI) database, in which five benchmark datasets have been selected for testing The experiments are executed to compare 1-a-r, 1-a-1 and this method on training and testing time The result shows that the speeds of training and classifying are improved remarkably

Proceedings ArticleDOI
01 Jul 2006
TL;DR: Simulation results demonstrate LS-SVM method is better than SVM in accuracy, static state performance as well as computer cost.
Abstract: This paper firstly provides a short introduction to least square support vector machine (LS-SVM), then provides sequential minimal optimization (SMO) based pruning algorithms for LS-SVM. After a simple discussion of inverse-model identification, a LS-SVM based direct-model identification method is developed by using LS-SVM's excellent ability of function approximation. The most important and difficult step in inverse control methods is the modeling of the inverse nonlinear dynamic system. Both SVM and LS-SVM can solve this problem. Simulation results demonstrate LS-SVM method is better than SVM in accuracy, static state performance as well as computer cost.

Proceedings ArticleDOI
01 Dec 2006
TL;DR: It is shown that the Evolutionary Support Vector Machine has good generalization properties when compared with Support Vector Machines using standard (polynomial and radial basis) kernel functions.
Abstract: A machine learning algorithm using evolutionary algorithms and Support Vector Machines is presented. The kernel function of support vector machines are evolved using recently introduced Gene Expression Programming algorithms. This technique trains a support vector machine with the kernel function most suitable for the training data set rather than pre-specifying the kernel function. The fitness of the kernel is measured by calculating cross validation accuracy. SVM trained with the fittest kernels is then used to classify previously unseen data. The algorithm is elucidated using preliminary case studies for classification of cancer data and bank transaction data set. It is shown that the Evolutionary Support Vector Machine has good generalization properties when compared with Support Vector Machines using standard (polynomial and radial basis) kernel functions.

Journal ArticleDOI
TL;DR: Another learning algorithm, particle swarm optimization, for training SVM is introduted and it is found that this method works well on UCI datasets.

Proceedings ArticleDOI
01 Jan 2006
TL;DR: Based on the classification equivalence between the previous training set and the newly added training set, a new algorithm for SVM incremental learning is proposed and the useless sample is discarded and useful information in training samples is accumulated.
Abstract: Based on analyzing the relationship between the Karush-Kuhn-Tucker (KKT) conditions of support vector machine and the distribution of the training samples, the possible changes of support vector set after new samples are added to training set was analyzed, and the generalized Karush-Kuhn-Tucker conditions was defined. Based on the classification equivalence between the previous training set and the newly added training set, a new algorithm for SVM incremental learning is proposed. With the presented algorithm, the useless sample is discarded and useful information in training samples is accumulated. Experimental results with the standard datasets indicate the effectiveness of the proposed algorithm

Proceedings ArticleDOI
30 Oct 2006
TL;DR: This paper considers an SMO algorithm, which deals with the same optimization problem as Flake and Lawrence's SMO, and gives a rigorous proof that it always stops within a finite number of iterations.
Abstract: A sequential minimal optimization (SMO) algorithm for support vector regression (SVR) has recently been proposed by Flake and Lawrence. However, the convergence of their algorithm has not been proved so far. In this paper, we consider an SMO algorithm, which deals with the same optimization problem as Flake and Lawrence's SMO, and give a rigorous proof that it always stops within a finite number of iterations.

Proceedings ArticleDOI
Hai-Tao He1, Nan Li1
01 Aug 2006
TL;DR: The new approach based on the structural equivalence of radial basis function (RBF) network and support vector machines (SVM) was efficient and intelligent and the SMO algorithm was employed to obtain more optimal structure and initial parameters of RBF network.
Abstract: In the traditional method of flatness pattern recognition known as neural network with a changing topological configuration, slow convergence and local minimum were observed. Moreover, the process of experimenting the initial parameters and structure of the neural network according to the experience before has been proved time-consuming and complex. In this paper, a new approach was proposed based on the structural equivalence of radial basis function (RBF) network and Support Vector Machines (SVM). The SMO algorithm was employed to obtain more optimal structure and initial parameters of RBF network, and then the BP algorithm was used to adjust RBF network slightly. The new approach with the advantages of SVM, such as fast learning and whole optimization, was efficient and intelligent.

Proceedings ArticleDOI
30 Oct 2006
TL;DR: A new formulation for SVM is proposed that makes possible to include the hyperparameter C in the definition of the kernel parameters, equivalent to choosing the best values of kernel parameters.
Abstract: Model selection for support vector machines concerns the tuning of SVM hyperparameters as C controlling the amount of overlap and the kernel parameters. Several criteria developed for tuning the SVM hyperparameters, may not be differentiable w.r.t. C, consequently, gradient-based optimization methods are not applicable. In this paper, we propose a new formulation for SVM that makes possible to include the hyperparameter C in the definition of the kernel parameters. Then, tuning hyperparameters for SVM is equivalent to choosing the best values of kernel parameters. We tested this new formulation for model selection by using the criterion of empirical error, technique based on generalization error minimization through a validation set. The experiments on different benchmarks show promising results confirming our approach.

Journal ArticleDOI
TL;DR: DAGSVMlight is proposed to select the workings set which is identical to the working set selected by SVMlight approach and may be an especially useful tool for large-scale multiclass classification problems and lead to more widespread use of SVMs in the engineering community due to its good performance.

01 Jan 2006
TL;DR: A novel QP solver based on sequential minimal optimization (SMO) based on a novel strategy for selecting variables to be optimized which converges in a finite number of iterations to the solution which differs from the optimal one at most by a prescribed constant.
Abstract: This report proposes a novel optimization algorithm for learning support vector machines (SVM) classifiers with structured output spaces introduced recently by Tsochantaridis et. al. Learning structural SVM classifier leads to a special instance of quadratic programming (QP) optimization with a huge number of constraints. The number of constraints is proportional to the cardinality of the output space which makes the QP task intractable by classical optimization methods. We propose a novel QP solver based on sequential minimal optimization (SMO). Unlike the original SMO, we propose a novel strategy for selecting variables to be optimized. The strategy aims at selecting such variables which yield the maximal improvement of optimization. We prove that the algorithm converges in a finite number of iterations to the solution which differs from the optimal one at most by a prescribed constant. Experiments performed show that the proposed algorithm is very competitive to a cutting plane algorithm of Tsochantaridis et. al. The proposed algorithm can be easily implemented and it does not require any external QP solver in contrary to the cutting plain algorithm. We demonstrated a capability of the algorithm on a problem of learning the Hidden Markov Network for color image segmentation and learning a structural classifier for car license plate recognition.

Book ChapterDOI
10 Sep 2006
TL;DR: The nonlinear regression ability of the Support Vector Machines has been demonstrated by forming the SVM model of a microwave transistor and it has been compared with its neural model.
Abstract: Support Vector Machines (SVM) are a system for efficiently training linear learning machines in the kernel induced feature spaces, while respecting the insights provided by the generalization theory and exploiting the optimization theory. In this work, Support Vector Machines are employed for the nonlinear regression. The nonlinear regression ability of the Support Vector Machines has been demonstrated by forming the SVM model of a microwave transistor and it has been compared with its neural model.