scispace - formally typeset
Search or ask a question
Journal ArticleDOI

LIBSVM: A library for support vector machines

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
09 Jun 2008
TL;DR: The first comprehensive study on general mining method aiming to find most significant patterns directly, and graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.
Abstract: With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

331 citations

Book ChapterDOI
06 Sep 2014
TL;DR: The problem is formulated as one of modeling positive training data at the decision boundary, where the statistical extreme value theory can be invoked, and a new algorithm called the P I -SVM is introduced for estimating the unnormalized posterior probability of class inclusion.
Abstract: The perceived success of recent visual recognition approaches has largely been derived from their performance on classification tasks, where all possible classes are known at training time. But what about open set problems, where unknown classes appear at test time? Intuitively, if we could accurately model just the positive data for any known class without overfitting, we could reject the large set of unknown classes even under an assumption of incomplete class knowledge. In this paper, we formulate the problem as one of modeling positive training data at the decision boundary, where we can invoke the statistical extreme value theory. A new algorithm called the P I -SVM is introduced for estimating the unnormalized posterior probability of class inclusion.

331 citations


Cites methods from "LIBSVM: A library for support vecto..."

  • ...With respect to approaches that are suitable for multi-class open set recognition, we consider standard multi-class SVM variants including the 1-vs-Rest Multi-class RBF SVM (LIBSVM ErrorCode implementation [28]), Pairwise Multi-class RBF SVM (LIBSVM implementation [9]), 1-vs-Rest Multi-class RBF SVM with Platt Probability Estimation and a threshold (LIBSVM ErrorCode implementation [28]), and Pairwise Multi-class RBF SVM with Platt Probability Estimation and a threshold (LIBSVM implementation [9])....

    [...]

  • ...RBF SVM, 1-vs-Rest binary linear SVM, and 1-vs-Rest binary RBF SVM with Platt Probability Estimation [37] and a threshold (all using LIBSVM implementations [9])....

    [...]

Proceedings Article
03 Jun 2012
TL;DR: Evidence is presented that social media, with appropriate natural language processing techniques, can be a valuable and abundant data source for the study of bullying in both worlds.
Abstract: We introduce the social study of bullying to the NLP community. Bullying, in both physical and cyber worlds (the latter known as cyberbullying), has been recognized as a serious national health issue among adolescents. However, previous social studies of bullying are handicapped by data scarcity, while the few computational studies narrowly restrict themselves to cyberbullying which accounts for only a small fraction of all bullying episodes. Our main contribution is to present evidence that social media, with appropriate natural language processing techniques, can be a valuable and abundant data source for the study of bullying in both worlds. We identify several key problems in using such data sources and formulate them as NLP tasks, including text classification, role labeling, sentiment analysis, and topic modeling. Since this is an introductory paper, we present baseline results on these tasks using off-the-shelf NLP solutions, and encourage the NLP community to contribute better models in the future.

329 citations

Journal ArticleDOI
TL;DR: A sequence-based predictor called iACP is reported, developed by the approach of optimizing the g-gap dipeptide components, that remarkably outperformed the existing predictors for the same purpose in both overall accuracy and stability.
Abstract: // Wei Chen 1, 4 , Hui Ding 2 , Pengmian Feng 3 , Hao Lin 2, 4 , Kuo-Chen Chou 4, 5 1 Department of Physics, School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, China 2 Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China 3 School of Public Health, North China University of Science and Technology, Tangshan, China 4 Gordon Life Science Institute, Belmont, Massachusetts, United States of America 5 Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia Correspondence to: Wei Chen, e-mail: wchen@gordonlifescience.org , chenweiimu@gmail.com Hao Lin, e-mail: hlin@uestc.edu.cn Kuo-Chen Chou, e-mail: kcchou@gordonlifescience.org Keywords: anticancer peptides, PseAAC, g-gap dipeptide mode, incremental feature selection, iACP webserver Received: January 06, 2016 Accepted: February 11, 2016 Published: March 01, 2016 ABSTRACT Cancer remains a major killer worldwide. Traditional methods of cancer treatment are expensive and have some deleterious side effects on normal cells. Fortunately, the discovery of anticancer peptides (ACPs) has paved a new way for cancer treatment. With the explosive growth of peptide sequences generated in the post genomic age, it is highly desired to develop computational methods for rapidly and effectively identifying ACPs, so as to speed up their application in treating cancer. Here we report a sequence-based predictor called iACP developed by the approach of optimizing the g-gap dipeptide components. It was demonstrated by rigorous cross-validations that the new predictor remarkably outperformed the existing predictors for the same purpose in both overall accuracy and stability. For the convenience of most experimental scientists, a publicly accessible web-server for iACP has been established at http://lin.uestc.edu.cn/server/iACP , by which users can easily obtain their desired results.

329 citations

Proceedings ArticleDOI
11 Aug 2013
TL;DR: This work applies two modifications in order to make one-class SVMs more suitable for unsupervised anomaly detection: Robust one- Class SVMs and eta one- class SVMs, with the key idea, that outliers should contribute less to the decision boundary as normal instances.
Abstract: Support Vector Machines (SVMs) have been one of the most successful machine learning techniques for the past decade. For anomaly detection, also a semi-supervised variant, the one-class SVM, exists. Here, only normal data is required for training before anomalies can be detected. In theory, the one-class SVM could also be used in an unsupervised anomaly detection setup, where no prior training is conducted. Unfortunately, it turns out that a one-class SVM is sensitive to outliers in the data. In this work, we apply two modifications in order to make one-class SVMs more suitable for unsupervised anomaly detection: Robust one-class SVMs and eta one-class SVMs. The key idea of both modifications is, that outliers should contribute less to the decision boundary as normal instances. Experiments performed on datasets from UCI machine learning repository show that our modifications are very promising: Comparing with other standard unsupervised anomaly detection algorithms, the enhanced one-class SVMs are superior on two out of four datasets. In particular, the proposed eta one-class SVM has shown the most promising results.

328 citations


Cites methods from "LIBSVM: A library for support vecto..."

  • ...Also, LIBSVM [14] has been extended, The modifications presented here are published in the proceedings of ACM SIGKDD Conference [6]....

    [...]

  • ...A popular SVM library is LIBSVM [14]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

37,861 citations


"LIBSVM: A library for support vecto..." refers background in this paper

  • ...{1,-1}, C-SVC [Boser et al. 1992; Cortes and Vapnik 1995] solves 4LIBSVM Tools: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools. the following primal optimization problem: l t min 1 w T w +C .i (1) w,b,. 2 i=1 subject to yi(w T f(xi) +b) =1 -.i, .i =0,i =1,...,l, where f(xi)maps xi into a…...

    [...]

01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

26,531 citations


"LIBSVM: A library for support vecto..." refers background in this paper

  • ...Under given parameters C > 0and E> 0, the standard form of support vector regression [Vapnik 1998] is ll tt 1 T min w w + C .i + C .i * w,b,.,. * 2 i=1 i=1 subject to w T f(xi) + b- zi = E + .i, zi - w T f(xi) - b = E + .i * , * .i,.i = 0,i = 1,...,l....

    [...]

  • ...It can be clearly seen that C-SVC and one-class SVM are already in the form of problem (11)....

    [...]

  • ..., l, in two classes, and a vector y ∈ Rl such that yi ∈ {1,−1}, C-SVC (Cortes and Vapnik, 1995; Vapnik, 1998) solves the following primal problem:...

    [...]

  • ...Then, according to the SVM formulation, svm train one calls a corresponding subroutine such as solve c svc for C-SVC and solve nu svc for ....

    [...]

  • ...Note that b of C-SVC and E-SVR plays the same role as -. in one-class SVM, so we de.ne ....

    [...]

Proceedings ArticleDOI
01 Jul 1992
TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.
Abstract: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.

11,211 citations


"LIBSVM: A library for support vecto..." refers background in this paper

  • ...It can be clearly seen that C-SVC and one-class SVM are already in the form of problem (11)....

    [...]

  • ...Then, according to the SVM formulation, svm train one calls a corresponding subroutine such as solve c svc for C-SVC and solve nu svc for ....

    [...]

  • ...Note that b of C-SVC and E-SVR plays the same role as -. in one-class SVM, so we de.ne ....

    [...]

  • ...In Section 2, we describe SVM formulations sup­ported in LIBSVM: C-Support Vector Classi.cation (C-SVC), ....

    [...]

  • ...{1,-1}, C-SVC [Boser et al. 1992; Cortes and Vapnik 1995] solves 4LIBSVM Tools: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools. the following primal optimization problem: l t min 1 w T w +C .i (1) w,b,. 2 i=1 subject to yi(w T f(xi) +b) =1 -.i, .i =0,i =1,...,l, where f(xi)maps xi into a higher-dimensional space and C > 0 is the regularization parameter....

    [...]

01 Jan 2008
TL;DR: A simple procedure is proposed, which usually gives reasonable results and is suitable for beginners who are not familiar with SVM.
Abstract: Support vector machine (SVM) is a popular technique for classication. However, beginners who are not familiar with SVM often get unsatisfactory results since they miss some easy but signicant steps. In this guide, we propose a simple procedure, which usually gives reasonable results.

7,069 citations


"LIBSVM: A library for support vecto..." refers methods in this paper

  • ...A Simple Example of Running LIBSVM While detailed instructions of using LIBSVM are available in the README file of the package and the practical guide by Hsu et al. [2003], here we give a simple example....

    [...]

  • ...For instructions of using LIBSVM, see the README file included in the package, the LIBSVM FAQ,3 and the practical guide by Hsu et al. [2003]. LIBSVM supports the following learning tasks....

    [...]

Journal ArticleDOI
TL;DR: Decomposition implementations for two "all-together" multiclass SVM methods are given and it is shown that for large problems methods by considering all data at once in general need fewer support vectors.
Abstract: Support vector machines (SVMs) were originally designed for binary classification. How to effectively extend it for multiclass classification is still an ongoing research issue. Several methods have been proposed where typically we construct a multiclass classifier by combining several binary classifiers. Some authors also proposed methods that consider all classes at once. As it is computationally more expensive to solve multiclass problems, comparisons of these methods using large-scale problems have not been seriously conducted. Especially for methods solving multiclass SVM in one step, a much larger optimization problem is required so up to now experiments are limited to small data sets. In this paper we give decomposition implementations for two such "all-together" methods. We then compare their performance with three methods based on binary classifications: "one-against-all," "one-against-one," and directed acyclic graph SVM (DAGSVM). Our experiments indicate that the "one-against-one" and DAG methods are more suitable for practical use than the other methods. Results also show that for large problems methods by considering all data at once in general need fewer support vectors.

6,562 citations