scispace - formally typeset
Search or ask a question

Showing papers on "Support vector machine published in 2005"


Journal ArticleDOI
TL;DR: In this article, the maximal statistical dependency criterion based on mutual information (mRMR) was proposed to select good features according to the maximal dependency condition. But the problem of feature selection is not solved by directly implementing mRMR.
Abstract: Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.

8,078 citations


Proceedings Article
05 Dec 2005
TL;DR: In this article, a Mahanalobis distance metric for k-NN classification is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin.
Abstract: We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3% on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification.

4,433 citations



Journal ArticleDOI
Mahesh Pal1
TL;DR: It is suggested that the random forest classifier performs equally well to SVMs in terms of classification accuracy and training time and the number of user‐defined parameters required byrandom forest classifiers is less than the number required for SVMs and easier to define.
Abstract: Growing an ensemble of decision trees and allowing them to vote for the most popular class produced a significant increase in classification accuracy for land cover classification. The objective of this study is to present results obtained with the random forest classifier and to compare its performance with the support vector machines (SVMs) in terms of classification accuracy, training time and user defined parameters. Landsat Enhanced Thematic Mapper Plus (ETM+) data of an area in the UK with seven different land covers were used. Results from this study suggest that the random forest classifier performs equally well to SVMs in terms of classification accuracy and training time. This study also concludes that the number of user‐defined parameters required by random forest classifiers is less than the number required for SVMs and easier to define.

2,255 citations


Journal ArticleDOI
TL;DR: How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes.
Abstract: How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their ...

2,005 citations


Journal Article
TL;DR: A new technique for working set selection in SMO-type decomposition methods that uses second order information to achieve fast convergence andoretical properties such as linear convergence are established.
Abstract: Working set selection is an important step in decomposition methods for training support vector machines (SVMs). This paper develops a new technique for working set selection in SMO-type decomposition methods. It uses second order information to achieve fast convergence. Theoretical properties such as linear convergence are established. Experiments demonstrate that the proposed method is faster than existing selection methods using first order information.

1,461 citations


Journal ArticleDOI
TL;DR: This paper assesses performance of regularized radial basis function neural networks (Reg-RBFNN), standard support vector machines (SVMs), kernel Fisher discriminant (KFD) analysis, and regularized AdaBoost (reg-AB) in the context of hyperspectral image classification.
Abstract: This paper presents the framework of kernel-based methods in the context of hyperspectral image classification, illustrating from a general viewpoint the main characteristics of different kernel-based approaches and analyzing their properties in the hyperspectral domain. In particular, we assess performance of regularized radial basis function neural networks (Reg-RBFNN), standard support vector machines (SVMs), kernel Fisher discriminant (KFD) analysis, and regularized AdaBoost (Reg-AB). The novelty of this work consists in: 1) introducing Reg-RBFNN and Reg-AB for hyperspectral image classification; 2) comparing kernel-based methods by taking into account the peculiarities of hyperspectral images; and 3) clarifying their theoretical relationships. To these purposes, we focus on the accuracy of methods when working in noisy environments, high input dimension, and limited training sets. In addition, some other important issues are discussed, such as the sparsity of the solutions, the computational burden, and the capability of the methods to provide outputs that can be directly interpreted as probabilities.

1,428 citations


Proceedings ArticleDOI
27 Dec 2005
TL;DR: It is shown that existing SVM software can be used to solve the SVM/LDA formulation and empirical comparisons of the proposed algorithm with SVM and LDA using both synthetic and real world benchmark data are presented.
Abstract: This paper describes a new large margin classifier, named SVM/LDA. This classifier can be viewed as an extension of support vector machine (SVM) by incorporating some global information about the data. The SVM/LDA classifier can be also seen as a generalization of linear discriminant analysis (LDA) by incorporating the idea of (local) margin maximization into standard LDA formulation. We show that existing SVM software can be used to solve the SVM/LDA formulation. We also present empirical comparisons of the proposed algorithm with SVM and LDA using both synthetic and real world benchmark data.

1,030 citations


Journal ArticleDOI
TL;DR: This paper shows that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry and obtains provably approximately optimal solutions with the idea of core sets, and proposes the proposed Core Vector Machine (CVM) algorithm, which can be used with nonlinear kernels and has a time complexity that is linear in m.
Abstract: Standard SVM training has O(m3) time and O(m2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. By observing that practical SVM implementations only approximate the optimal solution by an iterative strategy, we scale up kernel methods by exploiting such "approximateness" in this paper. We first show that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry. Then, by adopting an efficient approximate MEB algorithm, we obtain provably approximately optimal solutions with the idea of core sets. Our proposed Core Vector Machine (CVM) algorithm can be used with nonlinear kernels and has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large toy and real-world data sets demonstrate that the CVM is as accurate as existing SVM implementations, but is much faster and can handle much larger data sets than existing scale-up methods. For example, CVM with the Gaussian kernel produces superior results on the KDDCUP-99 intrusion detection data, which has about five million training patterns, in only 1.4 seconds on a 3.2GHz Pentium--4 PC.

1,017 citations


Journal ArticleDOI
TL;DR: This paper investigates the predictability of financial movement direction with SVM by forecasting the weekly movement direction of NIKKEI 225 index and proposes a combining model by integrating SVM with the other classification methods.

984 citations


Journal ArticleDOI
TL;DR: This paper introduces a true multiclass formulation based on multinomial logistic regression and derives fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces.
Abstract: Recently developed methods for learning sparse classifiers are among the state-of-the-art in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learning-theoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a component-wise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsity-promoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmark data sets attest to the accuracy, sparsity, and efficiency of the proposed methods.

Proceedings ArticleDOI
07 Aug 2005
TL;DR: An algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially non-linear performance measures, in particular ROCArea and all measures that can be computed from the contingency table are given.
Abstract: This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1-score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially non-linear performance measures, in particular ROCArea and all measures that can be computed from the contingency table. The conventional classification SVM arises as a special case of our method.

Journal ArticleDOI
TL;DR: Results show that the SVM achieves a higher level of classification accuracy than either the ML or the ANN classifier, and that theSVM can be used with small training datasets and high‐dimensional data.
Abstract: Support vector machines (SVM) represent a promising development in machine learning research that is not widely used within the remote sensing community. This paper reports the results of two experiments in which multi‐class SVMs are compared with maximum likelihood (ML) and artificial neural network (ANN) methods in terms of classification accuracy. The two land cover classification experiments use multispectral (Landsat‐7 ETM+) and hyperspectral (DAIS) data, respectively, for test areas in eastern England and central Spain. Our results show that the SVM achieves a higher level of classification accuracy than either the ML or the ANN classifier, and that the SVM can be used with small training datasets and high‐dimensional data.

Journal ArticleDOI
TL;DR: A software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures is developed, the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets.
Abstract: Motivation: Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. To equip the system with the optimum combination of classifier, gene selection and cross-validation methods, we performed a systematic and comprehensive evaluation of several major algorithms for multicategory classification, several gene selection methods, multiple ensemble classifier methods and two cross-validation designs using 11 datasets spanning 74 diagnostic categories and 41 cancer types and 12 normal tissue types. Results: Multicategory support vector machines (MC-SVMs) are the most effective classifiers in performing accurate cancer diagnosis from gene expression data. The MC-SVM techniques by Crammer and Singer, Weston and Watkins and one-versus-rest were found to be the best methods in this domain. MC-SVMs outperform other popular machine learning algorithms, such as k-nearest neighbors, backpropagation and probabilistic neural networks, often to a remarkable degree. Gene selection techniques can significantly improve the classification performance of both MC-SVMs and other non-SVM learning algorithms. Ensemble classifiers do not generally improve performance of the best non-ensemble models. These results guided the construction of a software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures. This is the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets. Availability: The software system GEMS is available for download from http://www.gems-system.org for non-commercial use. Contact: alexander.statnikov@vanderbilt.edu

Journal ArticleDOI
TL;DR: This paper applies support vector machines (SVMs) to the bankruptcy prediction problem in an attempt to suggest a new model with better explanatory power and stability, and shows that SVM outperforms the other methods.
Abstract: Bankruptcy prediction has drawn a lot of research interests in previous literature, and recent studies have shown that machine learning techniques achieved better performance than traditional statistical ones. This paper applies support vector machines (SVMs) to the bankruptcy prediction problem in an attempt to suggest a new model with better explanatory power and stability. To serve this purpose, we use a grid-search technique using 5-fold cross-validation to find out the optimal parameter values of kernel function of SVM. In addition, to evaluate the prediction accuracy of SVM, we compare its performance with those of multiple discriminant analysis (MDA), logistic regression analysis (Logit), and three-layer fully connected back-propagation neural networks (BPNs). The experiment results show that SVM outperforms the other methods.

Journal ArticleDOI
TL;DR: The results demonstrate that the accuracy and generalization performance of SVM is better than that of BPN as the training set size gets smaller, and the several superior points of the SVM algorithm compared with BPN are investigated.
Abstract: This study investigates the efficacy of applying support vector machines (SVM) to bankruptcy prediction problem. Although it is a well-known fact that the back-propagation neural network (BPN) performs well in pattern recognition tasks, the method has some limitations in that it is an art to find an appropriate model structure and optimal solution. Furthermore, loading as many of the training set as possible into the network is needed to search the weights of the network. On the other hand, since SVM captures geometric characteristics of feature space without deriving weights of networks from the training data, it is capable of extracting the optimal solution with the small training set size. In this study, we show that the proposed classifier of SVM approach outperforms BPN to the problem of corporate bankruptcy prediction. The results demonstrate that the accuracy and generalization performance of SVM is better than that of BPN as the training set size gets smaller. We also examine the effect of the variability in performance with respect to various values of parameters in SVM. In addition, we investigate and summarize the several superior points of the SVM algorithm compared with BPN.

Journal ArticleDOI
TL;DR: It is demonstrated that SVM outperforms FLD in classification performance as well as in robustness of the spatial maps obtained (i.e, the SVM discrimination maps had greater overlap with the general linear model (GLM) analysis compared to the FLD).

01 Jan 2005
TL;DR: A series of novel semi-supervised learning approaches arising from a graph representation, where labeled and unlabeled instances are represented as vertices, and edges encode the similarity between instances are presented.
Abstract: In traditional machine learning approaches to classification, one uses only a labeled set to train the classifier. Labeled instances however are often difficult, expensive, or time consuming to obtain, as they require the efforts of experienced human annotators. Meanwhile unlabeled data may be relatively easy to collect, but there has been few ways to use them. Semi-supervised learning addresses this problem by using large amount of unlabeled data, together with the labeled data, to build better classifiers. Because semi-supervised learning requires less human effort and gives higher accuracy, it is of great interest both in theory and in practice. We present a series of novel semi-supervised learning approaches arising from a graph representation, where labeled and unlabeled instances are represented as vertices, and edges encode the similarity between instances. They address the following questions: How to use unlabeled data? (label propagation); What is the probabilistic interpretation? (Gaussian fields and harmonic functions); What if we can choose labeled data? (active learning); How to construct good graphs? (hyperparameter learning); How to work with kernel machines like SVM? (graph kernels); How to handle complex data like sequences? (kernel conditional random fields); How to handle scalability and induction? (harmonic mixtures). An extensive literature review is included at the end.

Journal Article
TL;DR: This contribution presents an online SVM algorithm based on the premise that active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.
Abstract: Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efficient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention?This contribution proposes an empirical answer. We first present an online SVM algorithm based on this premise. LASVM yields competitive misclassification rates after a single pass over the training examples, outspeeding state-of-the-art SVM solvers. Then we show how active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.

Journal ArticleDOI
TL;DR: In this article, support vector machines (SVM) were used to forecast building energy consumption in the tropical region, and the performance of SVM with respect to two parameters, C and ǫ, was explored using stepwise searching method based on radial-basis function (RBF) kernel.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: The system operates in real-time, and obtained 93% correct generalization to novel subjects for a 7-way forced choice on the Cohn-Kanade expression dataset, and has a mean accuracy of 94.8%.
Abstract: We present a systematic comparison of machine learning methods applied to the problem of fully automatic recognition of facial expressions. We report results on a series of experiments comparing recognition engines, including AdaBoost, support vector machines, linear discriminant analysis. We also explored feature selection techniques, including the use of AdaBoost for feature selection prior to classification by SVM or LDA. Best results were obtained by selecting a subset of Gabor filters using AdaBoost followed by classification with support vector machines. The system operates in real-time, and obtained 93% correct generalization to novel subjects for a 7-way forced choice on the Cohn-Kanade expression dataset. The outputs of the classifiers change smoothly as a function of time and thus can be used to measure facial expression dynamics. We applied the system to to fully automated recognition of facial actions (FACS). The present system classifies 17 action units, whether they occur singly or in combination with other actions, with a mean accuracy of 94.8%. We present preliminary results for applying this system to spontaneous facial expressions.

Book ChapterDOI
13 Jun 2005
TL;DR: Empirical evidence is given to show that the one-versus-all method using winner-takes-all strategy and the one to one method implemented by max-wins voting are inferior to another one-Versus-one method: one that uses Platt's posterior probabilities together with the pairwise coupling idea of Hastie and Tibshirani.
Abstract: Multiclass SVMs are usually implemented by combining several two-class SVMs. The one-versus-all method using winner-takes-all strategy and the one-versus-one method implemented by max-wins voting are popularly used for this purpose. In this paper we give empirical evidence to show that these methods are inferior to another one-versus-one method: one that uses Platt's posterior probabilities together with the pairwise coupling idea of Hastie and Tibshirani. The evidence is particularly strong when the training dataset is sparse.

01 Jan 2005
TL;DR: This paper discusses non-PSD kernels through the viewpoint of separability, and shows that the sigmoid kernel matrix is conditionally positive definite (CPD) in certain parameters and thus are valid kernels there.
Abstract: The sigmoid kernel was quite popular for support vector machines due to its origin from neural networks. Although it is known that the kernel matrix may not be positive semi-definite (PSD), other properties are not fully studied. In this paper, we discuss such non-PSD kernels through the viewpoint of separability. Results help to validate the possible use of non-PSD kernels. One example shows that the sigmoid kernel matrix is conditionally positive definite (CPD) in certain parameters and thus are valid kernels there. However, we also explain that the sigmoid kernel is not better than the RBF kernel in general. Experiments are given to illustrate our analysis. Finally, we discuss how to solve the non-convex dual problems by SMO-type decomposition methods. Suitable modifications for any symmetric non-PSD kernel matrices are proposed with convergence proofs.

Journal ArticleDOI
TL;DR: Techniques and algorithms developed in the framework of Statistical Learning Theory are applied to the problem of determining the location of a wireless device by measuring the signal strength values from a set of access points (location fingerprinting), with the advantage of a low algorithmic complexity in the normal operating phase.

Journal ArticleDOI
TL;DR: This letter provides a study of learning in a Hilbert space of vector-valued functions and derives the form of the minimal norm interpolant to a finite set of data and applies it to study some regularization functionals that are important in learning theory.
Abstract: In this letter, we provide a study of learning in a Hilbert space of vectorvalued functions. We motivate the need for extending learning theory of scalar-valued functions by practical considerations and establish some basic results for learning vector-valued functions that should prove useful in applications. Specifically, we allow an output space Y to be a Hilbert space, and we consider a reproducing kernel Hilbert space of functions whose values lie in Y. In this setting, we derive the form of the minimal norm interpolant to a finite set of data and apply it to study some regularization functionals that are important in learning theory. We consider specific examples of such functionals corresponding to multiple-output regularization networks and support vector machines, for both regression and classification. Finally, we provide classes of operator-valued kernels of the dot product and translation-invariant type.

Journal ArticleDOI
TL;DR: In a case study from the Ecuadorian Andes, logistic regression with stepwise backward variable selection yields lowest error rates and demonstrates the best generalization capabilities.
Abstract: . The predictive power of logistic regression, support vector machines and bootstrap-aggregated classification trees (bagging, double-bagging) is compared using misclassification error rates on independent test data sets. Based on a resampling approach that takes into account spatial autocorrelation, error rates for predicting "present" and "future" landslides are estimated within and outside the training area. In a case study from the Ecuadorian Andes, logistic regression with stepwise backward variable selection yields lowest error rates and demonstrates the best generalization capabilities. The evaluation outside the training area reveals that tree-based methods tend to overfit the data.

Book
01 Jan 2005
TL;DR: This book focuses on three main data mining tasks: data dimensionality reduction, classification, and rule extraction and is targeted at researchers in both academia and industry, while graduate students and developers of data mining systems will also profit from the detailed algorithmic descriptions.
Abstract: Finding information hidden in data is as theoretically difficult as it is practically important. With the objective of discovering unknown patterns from data, the methodologies of data mining were derived from statistics, machine learning, and artificial intelligence, and are being used successfully in application areas such as bioinformatics, banking, retail, and many others. Wang and Fu present in detail the state of the art on how to utilize fuzzy neural networks, multilayer perceptron neural networks, radial basis function neural networks, genetic algorithms, and support vector machines in such applications. They focus on three main data mining tasks: data dimensionality reduction, classification, and rule extraction. The book is targeted at researchers in both academia and industry, while graduate students and developers of data mining systems will also profit from the detailed algorithmic descriptions.

Journal ArticleDOI
TL;DR: In this article, a two-step detection/tracking method is proposed to deal with the nonrigid nature of human appearance on the road, where the detection phase is performed by a support vector machine (SVM) with size-normalized pedestrian candidates and the tracking phase is a combination of Kalman filter prediction and mean shift tracking.
Abstract: This paper presents a method for pedestrian detection and tracking using a single night-vision video camera installed on the vehicle. To deal with the nonrigid nature of human appearance on the road, a two-step detection/tracking method is proposed. The detection phase is performed by a support vector machine (SVM) with size-normalized pedestrian candidates and the tracking phase is a combination of Kalman filter prediction and mean shift tracking. The detection phase is further strengthened by information obtained by a road-detection module that provides key information for pedestrian validation. Experimental comparisons (e.g., grayscale SVM recognition versus binary SVM recognition and entire-body detection versus upper-body detection) have been carried out to illustrate the feasibility of our approach.

Journal ArticleDOI
TL;DR: This paper compares SVM to canonical variates analysis (CVA) by examining the relative sensitivity of each method to ten combinations of preprocessing choices consisting of spatial smoothing, temporal detrending, and motion correction, and proposes four methods for extracting activation maps from SVM models.

Journal ArticleDOI
TL;DR: The results show that the proposed feature selection method selects better gene subsets than the original SVM-RFE and improves the classification accuracy, and average test error from multiple partitions of training and test sets can be recommended as a reference of performance quality.
Abstract: This paper proposes a new feature selection method that uses a backward elimination procedure similar to that implemented in support vector machine recursive feature elimination (SVM-RFE). Unlike the SVM-RFE method, at each step, the proposed approach computes the feature ranking score from a statistical analysis of weight vectors of multiple linear SVMs trained on subsamples of the original training data. We tested the proposed method on four gene expression datasets for cancer classification. The results show that the proposed feature selection method selects better gene subsets than the original SVM-RFE and improves the classification accuracy. A Gene Ontology-based similarity assessment indicates that the selected subsets are functionally diverse, further validating our gene selection method. This investigation also suggests that, for gene expression-based cancer classification, average test error from multiple partitions of training and test sets can be recommended as a reference of performance quality.