scispace - formally typeset
Search or ask a question

Showing papers on "Support vector machine published in 2008"


Journal Article
TL;DR: LIBLINEAR is an open source library for large-scale linear classification that supports logistic regression and linear support vector machines and provides easy-to-use command-line tools and library calls for users and developers.
Abstract: LIBLINEAR is an open source library for large-scale linear classification. It supports logistic regression and linear support vector machines. We provide easy-to-use command-line tools and library calls for users and developers. Comprehensive documents are available for both beginners and advanced users. Experiments demonstrate that LIBLINEAR is very efficient on large sparse data sets.

7,848 citations


01 Jan 2008
TL;DR: A simple procedure is proposed, which usually gives reasonable results and is suitable for beginners who are not familiar with SVM.
Abstract: Support vector machine (SVM) is a popular technique for classication. However, beginners who are not familiar with SVM often get unsatisfactory results since they miss some easy but signicant steps. In this guide, we propose a simple procedure, which usually gives reasonable results.

7,069 citations


Book
12 Aug 2008
TL;DR: This book explains the principles that make support vector machines (SVMs) a successful modelling and prediction tool for a variety of applications and provides a unique in-depth treatment of both fundamental and recent material on SVMs that so far has been scattered in the literature.
Abstract: This book explains the principles that make support vector machines (SVMs) a successful modelling and prediction tool for a variety of applications. The authors present the basic ideas of SVMs together with the latest developments and current research questions in a unified style. They identify three reasons for the success of SVMs: their ability to learn well with only a very small number of free parameters, their robustness against several types of model violations and outliers, and their computational efficiency compared to several other methods. Since their appearance in the early nineties, support vector machines and related kernel-based methods have been successfully applied in diverse fields of application such as bioinformatics, fraud detection, construction of insurance tariffs, direct marketing, and data and text mining. As a consequence, SVMs now play an important role in statistical machine learning and are used not only by statisticians, mathematicians, and computer scientists, but also by engineers and data analysts. The book provides a unique in-depth treatment of both fundamental and recent material on SVMs that so far has been scattered in the literature. The book can thus serve as both a basis for graduate courses and an introduction for statisticians, mathematicians, and computer scientists. It further provides a valuable reference for researchers working in the field. The book covers all important topics concerning support vector machines such as: loss functions and their role in the learning process; reproducing kernel Hilbert spaces and their properties; a thorough statistical analysis that uses both traditional uniform bounds and more advanced localized techniques based on Rademacher averages and Talagrand's inequality; a detailed treatment of classification and regression; a detailed robustness analysis; and a description of some of the most recent implementation techniques. To make the book self-contained, an extensive appendix is added which provides the reader with the necessary background from statistics, probability theory, functional analysis, convex analysis, and topology.

4,664 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A discriminatively trained, multiscale, deformable part model for object detection, which achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge and outperforms the best results in the 2007 challenge in ten out of twenty categories.
Abstract: This paper describes a discriminatively trained, multiscale, deformable part model for object detection. Our system achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge. It also outperforms the best results in the 2007 challenge in ten out of twenty categories. The system relies heavily on deformable parts. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL challenge. Our system also relies heavily on new methods for discriminative training. We combine a margin-sensitive approach for data mining hard negative examples with a formalism we call latent SVM. A latent SVM, like a hidden CRF, leads to a non-convex training problem. However, a latent SVM is semi-convex and the training problem becomes convex once latent information is specified for the positive examples. We believe that our training methods will eventually make possible the effective use of more latent information such as hierarchical (grammar) models and models involving latent three dimensional pose.

2,893 citations


Proceedings ArticleDOI
16 Dec 2008
TL;DR: Results show that learning the optimum kernel combination of multiple features vastly improves the performance, from 55.1% for the best single feature to 72.8% forThe combination of all features.
Abstract: We investigate to what extent combinations of features can improve classification performance on a large dataset of similar classes. To this end we introduce a 103 class flower dataset. We compute four different features for the flowers, each describing different aspects, namely the local shape/texture, the shape of the boundary, the overall spatial distribution of petals, and the colour. We combine the features using a multiple kernel framework with a SVM classifier. The weights for each class are learnt using the method of Varma and Ray, which has achieved state of the art performance on other large dataset, such as Caltech 101/256. Our dataset has a similar challenge in the number of classes, but with the added difficulty of large between class similarity and small within class similarity. Results show that learning the optimum kernel combination of multiple features vastly improves the performance, from 55.1% for the best single feature to 72.8% for the combination of all features.

2,619 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: It is argued that two practices commonly used in image classification methods, have led to the inferior performance of NN-based image classifiers: Quantization of local image descriptors (used to generate "bags-of-words ", codebooks) and Computation of 'image-to-image' distance, instead of ' image- to-class' distance.
Abstract: State-of-the-art image classification methods require an intensive learning/training stage (using SVM, Boosting, etc.) In contrast, non-parametric nearest-neighbor (NN) based image classifiers require no training time and have other favorable properties. However, the large performance gap between these two families of approaches rendered NN-based image classifiers useless. We claim that the effectiveness of non-parametric NN-based image classification has been considerably undervalued. We argue that two practices commonly used in image classification methods, have led to the inferior performance of NN-based image classifiers: (i) Quantization of local image descriptors (used to generate "bags-of-words ", codebooks). (ii) Computation of 'image-to-image' distance, instead of 'image-to-class' distance. We propose a trivial NN-based classifier - NBNN, (Naive-Bayes nearest-neighbor), which employs NN- distances in the space of the local image descriptors (and not in the space of images). NBNN computes direct 'image- to-class' distances without descriptor quantization. We further show that under the Naive-Bayes assumption, the theoretically optimal image classifier can be accurately approximated by NBNN. Although NBNN is extremely simple, efficient, and requires no learning/training phase, its performance ranks among the top leading learning-based image classifiers. Empirical comparisons are shown on several challenging databases (Caltech-101 ,Caltech-256 and Graz-01).

1,228 citations


Journal ArticleDOI
TL;DR: An approach has been proposed which is based on using several principal components from the hyperspectral data and build morphological profiles which can be used all together in one extended morphological profile for classification of urban structures.
Abstract: A method is proposed for the classification of urban hyperspectral data with high spatial resolution. The approach is an extension of previous approaches and uses both the spatial and spectral information for classification. One previous approach is based on using several principal components (PCs) from the hyperspectral data and building several morphological profiles (MPs). These profiles can be used all together in one extended MP. A shortcoming of that approach is that it was primarily designed for classification of urban structures and it does not fully utilize the spectral information in the data. Similarly, the commonly used pixelwise classification of hyperspectral data is solely based on the spectral content and lacks information on the structure of the features in the image. The proposed method overcomes these problems and is based on the fusion of the morphological information and the original hyperspectral data, i.e., the two vectors of attributes are concatenated into one feature vector. After a reduction of the dimensionality, the final classification is achieved by using a support vector machine classifier. The proposed approach is tested in experiments on ROSIS data from urban areas. Significant improvements are achieved in terms of accuracies when compared to results obtained for approaches based on the use of MPs based on PCs only and conventional spectral classification. For instance, with one data set, the overall accuracy is increased from 79% to 83% without any feature reduction and to 87% with feature reduction. The proposed approach also shows excellent results with a limited training set.

1,092 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: It is shown that one can build histogram intersection kernel SVMs (IKSVMs) with runtime complexity of the classifier logarithmic in the number of support vectors as opposed to linear for the standard approach.
Abstract: Straightforward classification using kernelized SVMs requires evaluating the kernel for a test vector and each of the support vectors. For a class of kernels we show that one can do this much more efficiently. In particular we show that one can build histogram intersection kernel SVMs (IKSVMs) with runtime complexity of the classifier logarithmic in the number of support vectors as opposed to linear for the standard approach. We further show that by precomputing auxiliary tables we can construct an approximate classifier with constant runtime and space requirements, independent of the number of support vectors, with negligible loss in classification accuracy on various tasks. This approximation also applies to 1 - chi2 and other kernels of similar form. We also introduce novel features based on a multi-level histograms of oriented edge energy and present experiments on various detection datasets. On the INRIA pedestrian dataset an approximate IKSVM classifier based on these features has the current best performance, with a miss rate 13% lower at 10-6 False Positive Per Window than the linear SVM detector of Dalal & Triggs. On the Daimler Chrysler pedestrian dataset IKSVM gives comparable accuracy to the best results (based on quadratic SVM), while being 15times faster. In these experiments our approximate IKSVM is up to 2000times faster than a standard implementation and requires 200times less memory. Finally we show that a 50times speedup is possible using approximate IKSVM based on spatial pyramid features on the Caltech 101 dataset with negligible loss of accuracy.

1,074 citations


Book
01 Jan 2008
TL;DR: Novel computational approaches for deep learning of behaviors as opposed to just static patterns will be presented, based on structured nonnegative matrix factorizations of matrices that encode observation frequencies of behaviors.
Abstract: Future Directions -- Semi-supervised Multiple Classifier Systems: Background and Research Directions -- Boosting -- Boosting GMM and Its Two Applications -- Boosting Soft-Margin SVM with Feature Selection for Pedestrian Detection -- Observations on Boosting Feature Selection -- Boosting Multiple Classifiers Constructed by Hybrid Discriminant Analysis -- Combination Methods -- Decoding Rules for Error Correcting Output Code Ensembles -- A Probability Model for Combining Ranks -- EER of Fixed and Trainable Fusion Classifiers: A Theoretical Study with Application to Biometric Authentication Tasks -- Mixture of Gaussian Processes for Combining Multiple Modalities -- Dynamic Classifier Integration Method -- Recursive ECOC for Microarray Data Classification -- Using Dempster-Shafer Theory in MCF Systems to Reject Samples -- Multiple Classifier Fusion Performance in Networked Stochastic Vector Quantisers -- On Deriving the Second-Stage Training Set for Trainable Combiners -- Using Independence Assumption to Improve Multimodal Biometric Fusion -- Design Methods -- Half-Against-Half Multi-class Support Vector Machines -- Combining Feature Subsets in Feature Selection -- ACE: Adaptive Classifiers-Ensemble System for Concept-Drifting Environments -- Using Decision Tree Models and Diversity Measures in the Selection of Ensemble Classification Models -- Ensembles of Classifiers from Spatially Disjoint Data -- Optimising Two-Stage Recognition Systems -- Design of Multiple Classifier Systems for Time Series Data -- Ensemble Learning with Biased Classifiers: The Triskel Algorithm -- Cluster-Based Cumulative Ensembles -- Ensemble of SVMs for Incremental Learning -- Performance Analysis -- Design of a New Classifier Simulator -- Evaluation of Diversity Measures for Binary Classifier Ensembles -- Which Is the Best Multiclass SVM Method? An Empirical Study -- Over-Fitting in Ensembles of Neural Network Classifiers Within ECOC Frameworks -- Between Two Extremes: Examining Decompositions of the Ensemble Objective Function -- Data Partitioning Evaluation Measures for Classifier Ensembles -- Dynamics of Variance Reduction in Bagging and Other Techniques Based on Randomisation -- Ensemble Confidence Estimates Posterior Probability -- Applications -- Using Domain Knowledge in the Random Subspace Method: Application to the Classification of Biomedical Spectra -- An Abnormal ECG Beat Detection Approach for Long-Term Monitoring of Heart Patients Based on Hybrid Kernel Machine Ensemble -- Speaker Verification Using Adapted User-Dependent Multilevel Fusion -- Multi-modal Person Recognition for Vehicular Applications -- Using an Ensemble of Classifiers to Audit a Production Classifier -- Analysis and Modelling of Diversity Contribution to Ensemble-Based Texture Recognition Performance -- Combining Audio-Based and Video-Based Shot Classification Systems for News Videos Segmentation -- Designing Multiple Classifier Systems for Face Recognition -- Exploiting Class Hierarchies for Knowledge Transfer in Hyperspectral Data.

1,073 citations


Proceedings ArticleDOI
05 Jul 2008
TL;DR: A novel dual coordinate descent method for linear SVM with L1-and L2-loss functions that reaches an ε-accurate solution in O(log(1/ε)) iterations is presented.
Abstract: In many applications, data appear with a huge number of instances as well as features. Linear Support Vector Machines (SVM) is one of the most popular tools to deal with such large-scale sparse data. This paper presents a novel dual coordinate descent method for linear SVM with L1-and L2-loss functions. The proposed method is simple and reaches an e-accurate solution in O(log(1/e)) iterations. Experiments indicate that our method is much faster than state of the art solvers such as Pegasos, TRON, SVMperf, and a recent primal coordinate descent implementation.

1,014 citations


Proceedings ArticleDOI
24 Aug 2008
TL;DR: This paper shows that models trained using the new methods perform better than the current state-of-the-art biased SVM method for learning from positive and unlabeled examples, and applies them to solve a real-world problem: identifying protein records that should be included in an incomplete specialized molecular biology database.
Abstract: The input to an algorithm that learns a binary classifier normally consists of two sets of examples, where one set consists of positive examples of the concept to be learned, and the other set consists of negative examples. However, it is often the case that the available training data are an incomplete set of positive examples, and a set of unlabeled examples, some of which are positive and some of which are negative. The problem solved in this paper is how to learn a standard binary classifier given a nontraditional training set of this nature.Under the assumption that the labeled examples are selected randomly from the positive examples, we show that a classifier trained on positive and unlabeled examples predicts probabilities that differ by only a constant factor from the true conditional probabilities of being positive. We show how to use this result in two different ways to learn a classifier from a nontraditional training set. We then apply these two new methods to solve a real-world problem: identifying protein records that should be included in an incomplete specialized molecular biology database. Our experiments in this domain show that models trained using the new methods perform better than the current state-of-the-art biased SVM method for learning from positive and unlabeled examples.

Journal ArticleDOI
TL;DR: The RF methodology is attractive for use in classification problems when the goals of the study are to produce an accurate classifier and to provide insight regarding the discriminative ability of individual predictor variables.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the classification accuracy rates of the developed approach surpass those of grid search and many other approaches, and that the developed PSO+SVM approach has a similar result to GA+S VM, Therefore, the PSO + SVM approach is valuable for parameter determination and feature selection in an SVM.
Abstract: Support vector machine (SVM) is a popular pattern classification method with many diverse applications. Kernel parameter setting in the SVM training procedure, along with the feature selection, significantly influences the classification accuracy. This study simultaneously determines the parameter values while discovering a subset of features, without reducing SVM classification accuracy. A particle swarm optimization (PSO) based approach for parameter determination and feature selection of the SVM, termed PSO+SVM, is developed. Several public datasets are employed to calculate the classification accuracy rate in order to evaluate the developed PSO+SVM approach. The developed approach was compared with grid search, which is a conventional method of searching parameter values, and other approaches. Experimental results demonstrate that the classification accuracy rates of the developed approach surpass those of grid search and many other approaches, and that the developed PSO+SVM approach has a similar result to GA+SVM. Therefore, the PSO+SVM approach is valuable for parameter determination and feature selection in an SVM.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: A simple yet powerful branch-and-bound scheme that allows efficient maximization of a large class of classifier functions over all possible subimages and converges to a globally optimal solution typically in sublinear time is proposed.
Abstract: Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To perform localization, one can take a sliding window approach, but this strongly increases the computational cost, because the classifier function has to be evaluated over a large set of candidate subwindows. In this paper, we propose a simple yet powerful branch-and-bound scheme that allows efficient maximization of a large class of classifier functions over all possible subimages. It converges to a globally optimal solution typically in sublinear time. We show how our method is applicable to different object detection and retrieval scenarios. The achieved speedup allows the use of classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest neighbor classifiers based on the chi2-distance. We demonstrate state-of-the-art performance of the resulting systems on the UIUC Cars dataset, the PASCAL VOC 2006 dataset and in the PASCAL VOC 2007 competition.

Journal ArticleDOI
TL;DR: This work introduces a novel vocabulary using dense color SIFT descriptors and investigates the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM).
Abstract: We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos.

Journal ArticleDOI
TL;DR: This work presents a method to adjust SVM parameters before classification, and examines overlapped segmentation and majority voting as two techniques to improve controller performance.
Abstract: This paper proposes and evaluates the application of support vector machine (SVM) to classify upper limb motions using myoelectric signals. It explores the optimum configuration of SVM-based myoelectric control, by suggesting an advantageous data segmentation technique, feature set, model selection approach for SVM, and postprocessing methods. This work presents a method to adjust SVM parameters before classification, and examines overlapped segmentation and majority voting as two techniques to improve controller performance. A SVM, as the core of classification in myoelectric control, is compared with two commonly used classifiers: linear discriminant analysis (LDA) and multilayer perceptron (MLP) neural networks. It demonstrates exceptional accuracy, robust performance, and low computational load. The entropy of the output of the classifier is also examined as an online index to evaluate the correctness of classification; this can be used by online training for long-term myoelectric control operations.

Journal ArticleDOI
TL;DR: A novel graph-based semi supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood, and can propagate the labels from the labeled points to the whole data set using these linear neighborhoods with sufficient smoothness.
Abstract: In many practical data mining applications such as text classification, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semi supervised learning algorithms have aroused considerable interests from the data mining and machine learning fields. In recent years, graph-based semi supervised learning has been becoming one of the most active research areas in the semi supervised learning community. In this paper, a novel graph-based semi supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood. Our algorithm, named linear neighborhood propagation (LNP), can propagate the labels from the labeled points to the whole data set using these linear neighborhoods with sufficient smoothness. A theoretical analysis of the properties of LNP is presented in this paper. Furthermore, we also derive an easy way to extend LNP to out-of-sample data. Promising experimental results are presented for synthetic data, digit, and text classification tasks.

Book
28 Aug 2008
TL;DR: Techniques covered range from traditional multivariate methods, such as multiple regression, principal components, canonical variates, linear discriminant analysis, factor analysis, clustering, multidimensional scaling, and correspondence analysis, to the newer methods of density estimation, projection pursuit, neural networks, and classification and regression trees.
Abstract: Remarkable advances in computation and data storage and the ready availability of huge data sets have been the keys to the growth of the new disciplines of data mining and machine learning, while the enormous success of the Human Genome Project has opened up the field of bioinformatics. These exciting developments, which led to the introduction of many innovative statistical tools for high-dimensional data analysis, are described here in detail. The author takes a broad perspective; for the first time in a book on multivariate analysis, nonlinear methods are discussed in detail as well as linear methods. Techniques covered range from traditional multivariate methods, such as multiple regression, principal components, canonical variates, linear discriminant analysis, factor analysis, clustering, multidimensional scaling, and correspondence analysis, to the newer methods of density estimation, projection pursuit, neural networks, multivariate reduced-rank regression, nonlinear manifold learning, bagging, boosting, random forests, independent component analysis, support vector machines, and classification and regression trees. Another unique feature of this book is the discussion of database management systems. This book is appropriate for advanced undergraduate students, graduate students, and researchers in statistics, computer science, artificial intelligence, psychology, cognitive sciences, business, medicine, bioinformatics, and engineering. Familiarity with multivariable calculus, linear algebra, and probability and statistics is required. The book presents a carefully-integrated mixture of theory and applications, and of classical and modern multivariate statistical techniques, including Bayesian methods. There are over 60 interesting data sets used as examples in the book, over 200 exercises, and many color illustrations and photographs.

Journal ArticleDOI
TL;DR: Support vector machines are widely used in computational biology due to their high accuracy, their ability to deal with high-dimensional and large datasets, and their flexibility in modeling diverse sources of data.
Abstract: The increasing wealth of biological data coming from a large variety of platforms and the continued development of new high-throughput methods for probing biological systems require increasingly more sophisticated computational approaches. Putting all these data in simple-to-use databases is a first step; but realizing the full potential of the data requires algorithms that automatically extract regularities from the data, which can then lead to biological insight. Many of the problems in computational biology are in the form of prediction: starting from prediction of a gene's structure, prediction of its function, interactions, and role in disease. Support vector machines (SVMs) and related kernel methods are extremely good at solving such problems [1]–[3]. SVMs are widely used in computational biology due to their high accuracy, their ability to deal with high-dimensional and large datasets, and their flexibility in modeling diverse sources of data [2], [4]–[6]. The simplest form of a prediction problem is binary classification: trying to discriminate between objects that belong to one of two categories—positive (+1) or negative (−1). SVMs use two key concepts to solve this problem: large margin separation and kernel functions. The idea of large margin separation can be motivated by classification of points in two dimensions (see Figure 1). A simple way to classify the points is to draw a straight line and call points lying on one side positive and on the other side negative. If the two sets are well separated, one would intuitively draw the separating line such that it is as far as possible away from the points in both sets (see Figures 2 and ​and3).3). This intuitive choice captures the idea of large margin separation, which is mathematically formulated in the section Classification with Large Margin. Open in a separate window Figure 1 A linear classifier separating two classes of points (squares and circles) in two dimensions. The decision boundary divides the space into two sets depending on the sign of f(x) = 〈w,x〉+b. The grayscale level represents the value of the discriminant function f(x): dark for low values and a light shade for high values.

Journal ArticleDOI
TL;DR: BCPred, a novel method for predicting linear B‐cell epitopes using the subsequence kernel, is proposed and it is shown that the predictive performance of BCPred outperforms 11 SVM‐based classifiers developed and evaluated in the authors' experiments as well as the implementation of AAP (AUC = 0.7).
Abstract: The identification and characterization of B-cell epitopes play an important role in vaccine design, immunodiagnostic tests, and antibody production Therefore, computational tools for reliably predicting linear B-cell epitopes are highly desirable We evaluated Support Vector Machine (SVM) classifiers trained utilizing five different kernel methods using fivefold cross-validation on a homology-reduced data set of 701 linear B-cell epitopes, extracted from Bcipep database, and 701 non-epitopes, randomly extracted from SwissProt sequences Based on the results of our computational experiments, we propose BCPred, a novel method for predicting linear B-cell epitopes using the subsequence kernel We show that the predictive performance of BCPred (AUC = 0758) outperforms 11 SVM-based classifiers developed and evaluated in our experiments as well as our implementation of AAP (AUC = 07), a recently proposed method for predicting linear B-cell epitopes using amino acid pair antigenicity Furthermore, we compared BCPred with AAP and ABCPred, a method that uses recurrent neural networks, using two data sets of unique B-cell epitopes that had been previously used to evaluate ABCPred Analysis of the data sets used and the results of this comparison show that conclusions about the relative performance of different B-cell epitope prediction methods drawn on the basis of experiments using data sets of unique B-cell epitopes are likely to yield overly optimistic estimates of performance of evaluated methods This argues for the use of carefully homology-reduced data sets in comparing B-cell epitope prediction methods to avoid misleading conclusions about how different methods compare to each other Our homology-reduced data set and implementations of BCPred as well as the APP method are publicly available through our web-based server, BCPREDS, at: http://ailabcsiastateedu/bcpreds/

Journal ArticleDOI
TL;DR: Both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.
Abstract: Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain. In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms. We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.

Proceedings ArticleDOI
05 Jul 2008
TL;DR: To the surprise, the method that performs consistently well across all dimensions is random forests, followed by neural nets, boosted trees, and SVMs, and the effect of increasing dimensionality on the performance of the learning algorithms changes.
Abstract: In this paper we perform an empirical evaluation of supervised learning on high-dimensional data. We evaluate performance on three metrics: accuracy, AUC, and squared loss and study the effect of increasing dimensionality on the performance of the learning algorithms. Our findings are consistent with previous studies for problems of relatively low dimension, but suggest that as dimensionality increases the relative performance of the learning algorithms changes. To our surprise, the method that performs consistently well across all dimensions is random forests, followed by neural nets, boosted trees, and SVMs.

Journal ArticleDOI
TL;DR: Recursive Feature Elimination is evaluated in terms of sensitivity of discriminative maps (Receiver Operative Characteristic analysis) and generalization performances and compare it to previously used univariate voxel selection strategies based on activation and discrimination measures.

Journal ArticleDOI
01 Sep 2008
TL;DR: Experimental results showed the proposed PSO-SVM model can correctly select the discriminating input features and also achieve high classification accuracy.
Abstract: This study proposed a novel PSO-SVM model that hybridized the particle swarm optimization (PSO) and support vector machines (SVM) to improve the classification accuracy with a small and appropriate feature subset. This optimization mechanism combined the discrete PSO with the continuous-valued PSO to simultaneously optimize the input feature subset selection and the SVM kernel parameter setting. The hybrid PSO-SVM data mining system was implemented via a distributed architecture using the web service technology to reduce the computational time. In a heterogeneous computing environment, the PSO optimization was performed on the application server and the SVM model was trained on the client (agent) computer. The experimental results showed the proposed approach can correctly select the discriminating input features and also achieve high classification accuracy.

Journal ArticleDOI
TL;DR: Evaluating the capability of SVM in predicting defect-prone software modules and comparing its prediction performance against eight statistical and machine learning models in the context of four NASA datasets indicates that the prediction performance is generally better than or at least is competitive against the compared models.

Journal ArticleDOI
01 Sep 2008
TL;DR: A thorough experimental study to show the superiority of the generalization capability of the support vector machine (SVM) approach in the automatic classification of electrocardiogram (ECG) beats and suggest that further substantial improvements in terms of classification accuracy can be achieved by the proposed PSO-SVM classification system.
Abstract: The aim of this paper is twofold. First, we present a thorough experimental study to show the superiority of the generalization capability of the support vector machine (SVM) approach in the automatic classification of electrocardiogram (ECG) beats. Second, we propose a novel classification system based on particle swarm optimization (PSO) to improve the generalization performance of the SVM classifier. For this purpose, we have optimized the SVM classifier design by searching for the best value of the parameters that tune its discriminant function, and upstream by looking for the best subset of features that feed the classifier. The experiments were conducted on the basis of ECG data from the Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) arrhythmia database to classify five kinds of abnormal waveforms and normal beats. In particular, they were organized so as to test the sensitivity of the SVM classifier and that of two reference classifiers used for comparison, i.e., the k-nearest neighbor (kNN) classifier and the radial basis function (RBF) neural network classifier, with respect to the curse of dimensionality and the number of available training beats. The obtained results clearly confirm the superiority of the SVM approach as compared to traditional classifiers, and suggest that further substantial improvements in terms of classification accuracy can be achieved by the proposed PSO-SVM classification system. On an average, over three experiments making use of a different total number of training beats (250, 500, and 750, respectively), the PSO-SVM yielded an overall accuracy of 89.72% on 40438 test beats selected from 20 patient records against 85.98%, 83.70%, and 82.34% for the SVM, the kNN, and the RBF classifiers, respectively.

Proceedings ArticleDOI
20 Jul 2008
TL;DR: This study re-examines the assumption that most frequent terms in the pseudo-feedback documents are useful for the retrieval and proposes to integrate a term classification process to predict the usefulness of expansion terms.
Abstract: Pseudo-relevance feedback assumes that most frequent terms in the pseudo-feedback documents are useful for the retrieval. In this study, we re-examine this assumption and show that it does not hold in reality - many expansion terms identified in traditional approaches are indeed unrelated to the query and harmful to the retrieval. We also show that good expansion terms cannot be distinguished from bad ones merely on their distributions in the feedback documents and in the whole collection. We then propose to integrate a term classification process to predict the usefulness of expansion terms. Multiple additional features can be integrated in this process. Our experiments on three TREC collections show that retrieval effectiveness can be much improved when term classification is used. In addition, we also demonstrate that good terms should be identified directly according to their possible impact on the retrieval effectiveness, i.e. using supervised learning, instead of unsupervised learning.

Journal ArticleDOI
TL;DR: An overview of the SVM, both one-class and two-class SVM methods, is first presented followed by its use in landslide susceptibility mapping, where it is concluded that two- class SVM possesses better prediction efficiency than logistic regression and one- Class SVM.

Journal ArticleDOI
TL;DR: The performance and behavior of various S3VMs algorithms is studied together, under a common experimental setting, to review key ideas in this literature on semi-supervised support Vector Machines.
Abstract: Due to its wide applicability, the problem of semi-supervised classification is attracting increasing attention in machine learning. Semi-Supervised Support Vector Machines (S3VMs) are based on applying the margin maximization principle to both labeled and unlabeled examples. Unlike SVMs, their formulation leads to a non-convex optimization problem. A suite of algorithms have recently been proposed for solving S3VMs. This paper reviews key ideas in this literature. The performance and behavior of various S3VMs algorithms is studied together, under a common experimental setting.

Journal ArticleDOI
TL;DR: The findings suggest that while recurrent neural networks and support vector machines show the best performance, their forecasting accuracy was not statistically significantly better than that of the regression model.